Siva@ HASOC-Dravidian-CodeMix-FIRE-2020: Multilingual Offensive Speech Detection in Code-mixed and Romanized Text

dc.contributor.authorSharma, Yashvardhan
dc.date.accessioned2024-11-14T11:21:35Z
dc.date.available2024-11-14T11:21:35Z
dc.date.issued2020
dc.description.abstractDetecting and eliminating offensive and hate speech in social media content is an important concern as hate and offensive speech can have serious consequences in society ranging from ill-education among youth to hate crimes. Offensive speech identification in countries like India poses several additional challenges due to the usage of code-mixed and romanized variants of multiple languages by the users in their posts on social media. HASOC-Dravidian-CodeMix - FIRE 2020 extended the task of offensive speech identification to Dravidian languages. In this paper, we describe our approach in HASOC Dravidian Code-mixed 2020, which topped two out of three tasks(F1-weighted scores - 0.95 and 0.90) and stood second in the third task lagging the top model only by 0.01 points((F1-weighted score - 0.77). We propose a novel and flexible approach of selective translation and transliteration to be able to reap better results out of fine-tuning and ensembling multilingual transformer networks like XLM-RoBERTa and mBERT. Further, we implemented pre-trained, fine-tuned and ensembled versions of XLM-RoBERTa for offensive speech classification. We open source our work to facilitate further experimentation.en_US
dc.identifier.urihttps://ceur-ws.org/Vol-2826/T2-32.pdf
dc.identifier.urihttps://dspace.bits-pilani.ac.in/handle/123456789/16387
dc.language.isoenen_US
dc.publisherCEUR-WSen_US
dc.subjectComputer Scienceen_US
dc.subjectOffensive speech detectionen_US
dc.subjectSelective translation and transliterationen_US
dc.subjectXLM-RoBERTaen_US
dc.subjectTransformer Neural Networksen_US
dc.titleSiva@ HASOC-Dravidian-CodeMix-FIRE-2020: Multilingual Offensive Speech Detection in Code-mixed and Romanized Texten_US
dc.typeArticleen_US

Files

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: