dc.contributor.author |
Sharma, Yashvardhan |
|
dc.date.accessioned |
2024-11-14T09:47:03Z |
|
dc.date.available |
2024-11-14T09:47:03Z |
|
dc.date.issued |
2021 |
|
dc.identifier.uri |
https://aclanthology.org/2021.dravidianlangtech-1.6/ |
|
dc.identifier.uri |
http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16376 |
|
dc.description.abstract |
This paper presents the methodologies implemented while classifying Dravidian code-mixed comments according to their polarity. With datasets of code-mixed Tamil and Malayalam available, three methods are proposed - a sub-word level model, a word embedding based model and a machine learning based architecture. The sub-word and word embedding based models utilized Long Short Term Memory (LSTM) network along with language-specific preprocessing while the machine learning model used term frequency–inverse document frequency (TF-IDF) vectorization along with a Logistic Regression model. The sub-word level model was submitted to the the track ‘Sentiment Analysis for Dravidian Languages in Code-Mixed Text’ proposed by Forum of Information Retrieval Evaluation in 2020 (FIRE 2020). Although it received a rank of 5 and 12 for the Tamil and Malayalam tasks respectively in the FIRE 2020 track, this paper improves upon the results by a margin to attain final weighted F1-scores of 0.65 for the Tamil task and 0.68 for the Malayalam task. The former score is equivalent to that attained by the highest ranked team of the Tamil track. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Association for Computational Linguistics |
en_US |
dc.subject |
Computer Science |
en_US |
dc.subject |
Long short term memory (LSTM) |
en_US |
dc.subject |
Forum of Information Retrieval Evaluation |
en_US |
dc.title |
Sentiment Analysis of Dravidian Code Mixed Data |
en_US |
dc.type |
Article |
en_US |