Hate Speech Detection in Marathi and Code-Mixed Languages using TF-IDF and Transformers-Based BERT-Variants

dc.contributor.authorSharma, Yashvardhan
dc.date.accessioned2024-11-14T06:20:57Z
dc.date.available2024-11-14T06:20:57Z
dc.date.issued2022
dc.description.abstractPeople now express their ideas on social media on a global scale. Online attacks against others can be made without fear of repercussions due to the increased sense of freedom provided by the anonymity feature, which eventually leads to the spread of hate speech. The current attempts to filter online information and stop the propagation of hatred are insufficient. Regional languages’ popularity on social media and the lack of hate speech detectors that can be used in multiple languages are two aspects that contribute to this. This paper discusses two aspects of fake news detection namely: Identification of Conversational Hate-Speech in Code-Mixed Languages like Hindi, English and German, while second part discusses about Offensive Language Identification in Marathi. Our approach uses TF-IDF word embedding combined with Machine Learning models and transformer based BERT models for the classification of hate speech in each of the two sub tasks. The MuRIL-BERT model produces the best results, with an accuracy of 73.1% and a Macro-F1 score of 0.727 for the code-mixed language and a macro F1-score of 0.8306 on Marathi data, which is 6% more from previous year.en_US
dc.identifier.urihttps://www.bibsonomy.org/bibtex/1179151f1b137332bbf571f0882070142
dc.identifier.urihttp://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16366
dc.language.isoenen_US
dc.publisherCEUR-WSen_US
dc.subjectComputer Scienceen_US
dc.subjectBERT-Variantsen_US
dc.subjectHate Speechen_US
dc.subjectCyber hateen_US
dc.subjectSocial mediaen_US
dc.subjectHASOCen_US
dc.subjectTransformers modelen_US
dc.subjectMultilingual BERTen_US
dc.subjectMachine learning (ML)en_US
dc.titleHate Speech Detection in Marathi and Code-Mixed Languages using TF-IDF and Transformers-Based BERT-Variantsen_US
dc.typeArticleen_US

Files

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: