DSpace Repository

SentiCon: A Concept Based Feature Set for Sentiment Analysis

Show simple item record

dc.contributor.author Mitra, Satanik
dc.date.accessioned 2024-05-21T10:24:27Z
dc.date.available 2024-05-21T10:24:27Z
dc.date.issued 2018
dc.identifier.uri https://ieeexplore.ieee.org/abstract/document/8721408
dc.identifier.uri http://dspace.bits-pilani.ac.in:8080/jspui/xmlui/handle/123456789/14961
dc.description.abstract Selection and extraction of appropriate numerical features to do sentiment analysis on text data with greater accuracy remain an open problem. In supervised machine learning based sentiment analysis, Term Frequency- Inverse Document Frequency (TF-IDF) scores are used as a feature for classifying polarity of text data. TF-IDF features are a high dimensional representation of the importance of a word in the document. TF-IDF features are sparse and do not consider the correlation among the words which constructs the latent concepts in the document. Latent Semantic Analysis (LSA) removes sparseness of the TF-IDF features by representing it in a low dimensional matrix and extracts those hidden concepts. On the other hand, a natural property of text document is its information content. The quantitative estimation of Parts-of-Speech tags, negation words, sentiment lexicons etc. represent the quality of information shared in a text data. In this work, we propose an approach to generate a concept based domain specific feature set SentiCon by consolidating LSA with the quality of information of the corpus. We have applied Singular Value Decomposition (SVD) on TF-IDF features to find the LSA. We have tested SentiCon with two benchmark datasets IMDB movie review and Epinion Cars, Books datasets using four well-known classifiers - Decision Tree, Random Forrest, Support Vector Machine, and K-Nearest Neighbour classifiers. We have used standard performance measures precision, recall and F-measure to analyze the results. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.subject Management en_US
dc.subject Feature Extraction en_US
dc.subject Sentiment analysis en_US
dc.subject Machine learning en_US
dc.subject Sparse matrices en_US
dc.subject Semantics en_US
dc.title SentiCon: A Concept Based Feature Set for Sentiment Analysis en_US
dc.type Article en_US


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account