dc.contributor.author |
Sharma, Yashvardhan |
|
dc.date.accessioned |
2024-11-14T11:21:35Z |
|
dc.date.available |
2024-11-14T11:21:35Z |
|
dc.date.issued |
2020 |
|
dc.identifier.uri |
https://ceur-ws.org/Vol-2826/T2-32.pdf |
|
dc.identifier.uri |
http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16387 |
|
dc.description.abstract |
Detecting and eliminating offensive and hate speech in social media content is an important concern as hate
and offensive speech can have serious consequences in society ranging from ill-education among youth to hate
crimes. Offensive speech identification in countries like India poses several additional challenges due to the
usage of code-mixed and romanized variants of multiple languages by the users in their posts on social media.
HASOC-Dravidian-CodeMix - FIRE 2020 extended the task of offensive speech identification to Dravidian
languages. In this paper, we describe our approach in HASOC Dravidian Code-mixed 2020, which topped two
out of three tasks(F1-weighted scores - 0.95 and 0.90) and stood second in the third task lagging the top model
only by 0.01 points((F1-weighted score - 0.77). We propose a novel and flexible approach of selective translation
and transliteration to be able to reap better results out of fine-tuning and ensembling multilingual transformer
networks like XLM-RoBERTa and mBERT. Further, we implemented pre-trained, fine-tuned and ensembled
versions of XLM-RoBERTa for offensive speech classification. We open source our work to facilitate further
experimentation. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
CEUR-WS |
en_US |
dc.subject |
Computer Science |
en_US |
dc.subject |
Offensive speech detection |
en_US |
dc.subject |
Selective translation and transliteration |
en_US |
dc.subject |
XLM-RoBERTa |
en_US |
dc.subject |
Transformer Neural Networks |
en_US |
dc.title |
Siva@ HASOC-Dravidian-CodeMix-FIRE-2020: Multilingual Offensive Speech Detection in Code-mixed and Romanized Text |
en_US |
dc.type |
Article |
en_US |