Siva@ HASOC-Dravidian-CodeMix-FIRE-2020: Multilingual Offensive Speech Detection in Code-mixed and Romanized Text

Sharma, Yashvardhan

DSpace Home
→
BITS Faculty Publications
→
Department of Computer Science and Information Systems
→
View Item

Siva@ HASOC-Dravidian-CodeMix-FIRE-2020: Multilingual Offensive Speech Detection in Code-mixed and Romanized Text

Sharma, Yashvardhan

URI: https://ceur-ws.org/Vol-2826/T2-32.pdf
http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16387

Date: 2020

Abstract:

Detecting and eliminating offensive and hate speech in social media content is an important concern as hate and offensive speech can have serious consequences in society ranging from ill-education among youth to hate crimes. Offensive speech identification in countries like India poses several additional challenges due to the usage of code-mixed and romanized variants of multiple languages by the users in their posts on social media. HASOC-Dravidian-CodeMix - FIRE 2020 extended the task of offensive speech identification to Dravidian languages. In this paper, we describe our approach in HASOC Dravidian Code-mixed 2020, which topped two out of three tasks(F1-weighted scores - 0.95 and 0.90) and stood second in the third task lagging the top model only by 0.01 points((F1-weighted score - 0.77). We propose a novel and flexible approach of selective translation and transliteration to be able to reap better results out of fine-tuning and ensembling multilingual transformer networks like XLM-RoBERTa and mBERT. Further, we implemented pre-trained, fine-tuned and ensembled versions of XLM-RoBERTa for offensive speech classification. We open source our work to facilitate further experimentation.

Show full item record

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Department of Computer Science and Information Systems [1099]

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Siva@ HASOC-Dravidian-CodeMix-FIRE-2020: Multilingual Offensive Speech Detection in Code-mixed and Romanized Text

Siva@ HASOC-Dravidian-CodeMix-FIRE-2020: Multilingual Offensive Speech Detection in Code-mixed and Romanized Text

Abstract:

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account