SentiCon: A Concept Based Feature Set for Sentiment Analysis

dc.contributor.authorMitra, Satanik
dc.date.accessioned2024-05-21T10:24:27Z
dc.date.available2024-05-21T10:24:27Z
dc.date.issued2018
dc.description.abstractSelection and extraction of appropriate numerical features to do sentiment analysis on text data with greater accuracy remain an open problem. In supervised machine learning based sentiment analysis, Term Frequency- Inverse Document Frequency (TF-IDF) scores are used as a feature for classifying polarity of text data. TF-IDF features are a high dimensional representation of the importance of a word in the document. TF-IDF features are sparse and do not consider the correlation among the words which constructs the latent concepts in the document. Latent Semantic Analysis (LSA) removes sparseness of the TF-IDF features by representing it in a low dimensional matrix and extracts those hidden concepts. On the other hand, a natural property of text document is its information content. The quantitative estimation of Parts-of-Speech tags, negation words, sentiment lexicons etc. represent the quality of information shared in a text data. In this work, we propose an approach to generate a concept based domain specific feature set SentiCon by consolidating LSA with the quality of information of the corpus. We have applied Singular Value Decomposition (SVD) on TF-IDF features to find the LSA. We have tested SentiCon with two benchmark datasets IMDB movie review and Epinion Cars, Books datasets using four well-known classifiers - Decision Tree, Random Forrest, Support Vector Machine, and K-Nearest Neighbour classifiers. We have used standard performance measures precision, recall and F-measure to analyze the results.en_US
dc.identifier.urihttps://ieeexplore.ieee.org/abstract/document/8721408
dc.identifier.urihttps://dspace.bits-pilani.ac.in/xmlui/handle/123456789/14961
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.subjectManagementen_US
dc.subjectFeature Extractionen_US
dc.subjectSentiment analysisen_US
dc.subjectMachine learningen_US
dc.subjectSparse matricesen_US
dc.subjectSemanticsen_US
dc.titleSentiCon: A Concept Based Feature Set for Sentiment Analysisen_US
dc.typeArticleen_US

Files

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: