SentiCon: A Concept Based Feature Set for Sentiment Analysis

Mitra, Satanik

Please use this identifier to cite or link to this item: http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/14961

Full metadata record

DC Field	Value	Language
dc.contributor.author	Mitra, Satanik	-
dc.date.accessioned	2024-05-21T10:24:27Z	-
dc.date.available	2024-05-21T10:24:27Z	-
dc.date.issued	2018	-
dc.identifier.uri	https://ieeexplore.ieee.org/abstract/document/8721408	-
dc.identifier.uri	http://dspace.bits-pilani.ac.in:8080/jspui/xmlui/handle/123456789/14961	-
dc.description.abstract	Selection and extraction of appropriate numerical features to do sentiment analysis on text data with greater accuracy remain an open problem. In supervised machine learning based sentiment analysis, Term Frequency- Inverse Document Frequency (TF-IDF) scores are used as a feature for classifying polarity of text data. TF-IDF features are a high dimensional representation of the importance of a word in the document. TF-IDF features are sparse and do not consider the correlation among the words which constructs the latent concepts in the document. Latent Semantic Analysis (LSA) removes sparseness of the TF-IDF features by representing it in a low dimensional matrix and extracts those hidden concepts. On the other hand, a natural property of text document is its information content. The quantitative estimation of Parts-of-Speech tags, negation words, sentiment lexicons etc. represent the quality of information shared in a text data. In this work, we propose an approach to generate a concept based domain specific feature set SentiCon by consolidating LSA with the quality of information of the corpus. We have applied Singular Value Decomposition (SVD) on TF-IDF features to find the LSA. We have tested SentiCon with two benchmark datasets IMDB movie review and Epinion Cars, Books datasets using four well-known classifiers - Decision Tree, Random Forrest, Support Vector Machine, and K-Nearest Neighbour classifiers. We have used standard performance measures precision, recall and F-measure to analyze the results.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.subject	Management	en_US
dc.subject	Feature Extraction	en_US
dc.subject	Sentiment analysis	en_US
dc.subject	Machine learning	en_US
dc.subject	Sparse matrices	en_US
dc.subject	Semantics	en_US
dc.title	SentiCon: A Concept Based Feature Set for Sentiment Analysis	en_US
dc.type	Article	en_US
Appears in Collections:	Department of Management

Files in This Item:

There are no files associated with this item.

Show simple item record