Sentiment Analysis of Dravidian Code Mixed Data

dc.contributor.authorSharma, Yashvardhan
dc.date.accessioned2024-11-14T09:47:03Z
dc.date.available2024-11-14T09:47:03Z
dc.date.issued2021
dc.description.abstractThis paper presents the methodologies implemented while classifying Dravidian code-mixed comments according to their polarity. With datasets of code-mixed Tamil and Malayalam available, three methods are proposed - a sub-word level model, a word embedding based model and a machine learning based architecture. The sub-word and word embedding based models utilized Long Short Term Memory (LSTM) network along with language-specific preprocessing while the machine learning model used term frequency–inverse document frequency (TF-IDF) vectorization along with a Logistic Regression model. The sub-word level model was submitted to the the track ‘Sentiment Analysis for Dravidian Languages in Code-Mixed Text’ proposed by Forum of Information Retrieval Evaluation in 2020 (FIRE 2020). Although it received a rank of 5 and 12 for the Tamil and Malayalam tasks respectively in the FIRE 2020 track, this paper improves upon the results by a margin to attain final weighted F1-scores of 0.65 for the Tamil task and 0.68 for the Malayalam task. The former score is equivalent to that attained by the highest ranked team of the Tamil track.en_US
dc.identifier.urihttps://aclanthology.org/2021.dravidianlangtech-1.6/
dc.identifier.urihttp://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16376
dc.language.isoenen_US
dc.publisherAssociation for Computational Linguisticsen_US
dc.subjectComputer Scienceen_US
dc.subjectLong short term memory (LSTM)en_US
dc.subjectForum of Information Retrieval Evaluationen_US
dc.titleSentiment Analysis of Dravidian Code Mixed Dataen_US
dc.typeArticleen_US

Files

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: