Please use this identifier to cite or link to this item:
http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/14961
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Mitra, Satanik | - |
dc.date.accessioned | 2024-05-21T10:24:27Z | - |
dc.date.available | 2024-05-21T10:24:27Z | - |
dc.date.issued | 2018 | - |
dc.identifier.uri | https://ieeexplore.ieee.org/abstract/document/8721408 | - |
dc.identifier.uri | http://dspace.bits-pilani.ac.in:8080/jspui/xmlui/handle/123456789/14961 | - |
dc.description.abstract | Selection and extraction of appropriate numerical features to do sentiment analysis on text data with greater accuracy remain an open problem. In supervised machine learning based sentiment analysis, Term Frequency- Inverse Document Frequency (TF-IDF) scores are used as a feature for classifying polarity of text data. TF-IDF features are a high dimensional representation of the importance of a word in the document. TF-IDF features are sparse and do not consider the correlation among the words which constructs the latent concepts in the document. Latent Semantic Analysis (LSA) removes sparseness of the TF-IDF features by representing it in a low dimensional matrix and extracts those hidden concepts. On the other hand, a natural property of text document is its information content. The quantitative estimation of Parts-of-Speech tags, negation words, sentiment lexicons etc. represent the quality of information shared in a text data. In this work, we propose an approach to generate a concept based domain specific feature set SentiCon by consolidating LSA with the quality of information of the corpus. We have applied Singular Value Decomposition (SVD) on TF-IDF features to find the LSA. We have tested SentiCon with two benchmark datasets IMDB movie review and Epinion Cars, Books datasets using four well-known classifiers - Decision Tree, Random Forrest, Support Vector Machine, and K-Nearest Neighbour classifiers. We have used standard performance measures precision, recall and F-measure to analyze the results. | en_US |
dc.language.iso | en | en_US |
dc.publisher | IEEE | en_US |
dc.subject | Management | en_US |
dc.subject | Feature Extraction | en_US |
dc.subject | Sentiment analysis | en_US |
dc.subject | Machine learning | en_US |
dc.subject | Sparse matrices | en_US |
dc.subject | Semantics | en_US |
dc.title | SentiCon: A Concept Based Feature Set for Sentiment Analysis | en_US |
dc.type | Article | en_US |
Appears in Collections: | Department of Management |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.