Offensive Language Classification of Code-Mixed Tamil with Keras

dc.contributor.author	Sharma, Yashvardhan
dc.date.accessioned	2024-11-14T10:41:10Z
dc.date.available	2024-11-14T10:41:10Z
dc.date.issued	2021
dc.identifier.uri	https://ceur-ws.org/Vol-3159/T3-14.pdf
dc.identifier.uri	http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16381
dc.description.abstract	This paper presents the method adopted for completing Task 1 of Dravidian-CodeMix-HASOC (Hate Speech and Offensive Content Identification in English and Indo-European Languages) Shared Task proposed by the Forum of Information Retrieval Evaluation in 2021, for offensive language detection. For detecting offensive language, a custom model architecture using convolutional neural networks was created using Keras for supervised learning, and trained on a dataset of YouTube comments, written in code-mixed Tamil in both Roman and Tamil scripts. The 5 layer neural network was built only using Keras, and required simple tokenized data, padded to an appropriate length. Recurrent neural networks and transfer learning were not used, and an F-score of 0.835 was achieved with the created CNN model.	en_US
dc.language.iso	en	en_US
dc.publisher	CEUR-WS	en_US
dc.subject	Computer Science	en_US
dc.subject	Offensive language detection	en_US
dc.subject	Code-Mixed text	en_US
dc.subject	Tamil	en_US
dc.subject	HASOC	en_US
dc.title	Offensive Language Classification of Code-Mixed Tamil with Keras	en_US
dc.type	Article	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.