Towards Offensive Language Identification for Dravidian Languages

Sharma, Yashvardhan

DSpace Home
→
BITS Faculty Publications
→
Department of Computer Science and Information Systems
→
View Item

dc.contributor.author	Sharma, Yashvardhan
dc.date.accessioned	2024-11-14T09:51:35Z
dc.date.available	2024-11-14T09:51:35Z
dc.date.issued	2021
dc.identifier.uri	https://aclanthology.org/2021.dravidianlangtech-1.3/
dc.identifier.uri	http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16377
dc.description.abstract	Offensive speech identification in countries like India poses several challenges due to the usage of code-mixed and romanized variants of multiple languages by the users in their posts on social media. The challenge of offensive language identification on social media for Dravidian languages is harder, considering the low resources available for the same. In this paper, we explored the zero-shot learning and few-shot learning paradigms based on multilingual language models for offensive speech detection in code-mixed and romanized variants of three Dravidian languages - Malayalam, Tamil, and Kannada. We propose a novel and flexible approach of selective translation and transliteration to reap better results from fine-tuning and ensembling multilingual transformer networks like XLMRoBERTa and mBERT. We implemented pretrained, fine-tuned, and ensembled versions of XLM-RoBERTa for offensive speech classification. Further, we experimented with interlanguage, inter-task, and multi-task transfer learning techniques to leverage the rich resources available for offensive speech identification in the English language and to enrich the models with knowledge transfer from related tasks. The proposed models yielded good results and are promising for effective offensive speech identification in low resource settings.	en_US
dc.language.iso	en	en_US
dc.publisher	Association for Computational Linguistics	en_US
dc.subject	Computer Science	en_US
dc.subject	Speech identification	en_US
dc.subject	Dravidian languages	en_US
dc.subject	Malayalam languages	en_US
dc.subject	Social media	en_US
dc.title	Towards Offensive Language Identification for Dravidian Languages	en_US
dc.type	Article	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Department of Computer Science and Information Systems [1099]

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Towards Offensive Language Identification for Dravidian Languages

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account