Please use this identifier to cite or link to this item:
http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16366
Title: | Hate Speech Detection in Marathi and Code-Mixed Languages using TF-IDF and Transformers-Based BERT-Variants |
Authors: | Sharma, Yashvardhan |
Keywords: | Computer Science BERT-Variants Hate Speech Cyber hate Social media HASOC Transformers model Multilingual BERT Machine learning (ML) |
Issue Date: | 2022 |
Publisher: | CEUR-WS |
Abstract: | People now express their ideas on social media on a global scale. Online attacks against others can be made without fear of repercussions due to the increased sense of freedom provided by the anonymity feature, which eventually leads to the spread of hate speech. The current attempts to filter online information and stop the propagation of hatred are insufficient. Regional languages’ popularity on social media and the lack of hate speech detectors that can be used in multiple languages are two aspects that contribute to this. This paper discusses two aspects of fake news detection namely: Identification of Conversational Hate-Speech in Code-Mixed Languages like Hindi, English and German, while second part discusses about Offensive Language Identification in Marathi. Our approach uses TF-IDF word embedding combined with Machine Learning models and transformer based BERT models for the classification of hate speech in each of the two sub tasks. The MuRIL-BERT model produces the best results, with an accuracy of 73.1% and a Macro-F1 score of 0.727 for the code-mixed language and a macro F1-score of 0.8306 on Marathi data, which is 6% more from previous year. |
URI: | https://www.bibsonomy.org/bibtex/1179151f1b137332bbf571f0882070142 http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16366 |
Appears in Collections: | Department of Computer Science and Information Systems |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.