DSpace logo

Please use this identifier to cite or link to this item: http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/14961
Full metadata record
DC FieldValueLanguage
dc.contributor.authorMitra, Satanik-
dc.date.accessioned2024-05-21T10:24:27Z-
dc.date.available2024-05-21T10:24:27Z-
dc.date.issued2018-
dc.identifier.urihttps://ieeexplore.ieee.org/abstract/document/8721408-
dc.identifier.urihttp://dspace.bits-pilani.ac.in:8080/jspui/xmlui/handle/123456789/14961-
dc.description.abstractSelection and extraction of appropriate numerical features to do sentiment analysis on text data with greater accuracy remain an open problem. In supervised machine learning based sentiment analysis, Term Frequency- Inverse Document Frequency (TF-IDF) scores are used as a feature for classifying polarity of text data. TF-IDF features are a high dimensional representation of the importance of a word in the document. TF-IDF features are sparse and do not consider the correlation among the words which constructs the latent concepts in the document. Latent Semantic Analysis (LSA) removes sparseness of the TF-IDF features by representing it in a low dimensional matrix and extracts those hidden concepts. On the other hand, a natural property of text document is its information content. The quantitative estimation of Parts-of-Speech tags, negation words, sentiment lexicons etc. represent the quality of information shared in a text data. In this work, we propose an approach to generate a concept based domain specific feature set SentiCon by consolidating LSA with the quality of information of the corpus. We have applied Singular Value Decomposition (SVD) on TF-IDF features to find the LSA. We have tested SentiCon with two benchmark datasets IMDB movie review and Epinion Cars, Books datasets using four well-known classifiers - Decision Tree, Random Forrest, Support Vector Machine, and K-Nearest Neighbour classifiers. We have used standard performance measures precision, recall and F-measure to analyze the results.en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.subjectManagementen_US
dc.subjectFeature Extractionen_US
dc.subjectSentiment analysisen_US
dc.subjectMachine learningen_US
dc.subjectSparse matricesen_US
dc.subjectSemanticsen_US
dc.titleSentiCon: A Concept Based Feature Set for Sentiment Analysisen_US
dc.typeArticleen_US
Appears in Collections:Department of Management

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.