Offensive Language Classification of Code-Mixed Tamil with Keras

Sharma, Yashvardhan

Offensive Language Classification of Code-Mixed Tamil with Keras

Date

2021

Authors

Sharma, Yashvardhan

Publisher

CEUR-WS

Abstract

This paper presents the method adopted for completing Task 1 of Dravidian-CodeMix-HASOC (Hate Speech and Offensive Content Identification in English and Indo-European Languages) Shared Task proposed by the Forum of Information Retrieval Evaluation in 2021, for offensive language detection. For detecting offensive language, a custom model architecture using convolutional neural networks was created using Keras for supervised learning, and trained on a dataset of YouTube comments, written in code-mixed Tamil in both Roman and Tamil scripts. The 5 layer neural network was built only using Keras, and required simple tokenized data, padded to an appropriate length. Recurrent neural networks and transfer learning were not used, and an F-score of 0.835 was achieved with the created CNN model.

Keywords

Computer Science, Offensive language detection, Code-Mixed text, Tamil, HASOC

URI

https://ceur-ws.org/Vol-3159/T3-14.pdf
https://dspace.bits-pilani.ac.in/handle/123456789/16381

Collections

Department of Computer Science and Information Systems

Full item page

Offensive Language Classification of Code-Mixed Tamil with Keras

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By