Abstract:
People now express their ideas on social media on a global scale. Online attacks against others can be
made without fear of repercussions due to the increased sense of freedom provided by the anonymity
feature, which eventually leads to the spread of hate speech. The current attempts to filter online
information and stop the propagation of hatred are insufficient. Regional languages’ popularity on social
media and the lack of hate speech detectors that can be used in multiple languages are two aspects that
contribute to this. This paper discusses two aspects of fake news detection namely: Identification of
Conversational Hate-Speech in Code-Mixed Languages like Hindi, English and German, while second
part discusses about Offensive Language Identification in Marathi. Our approach uses TF-IDF word
embedding combined with Machine Learning models and transformer based BERT models for the
classification of hate speech in each of the two sub tasks. The MuRIL-BERT model produces the best
results, with an accuracy of 73.1% and a Macro-F1 score of 0.727 for the code-mixed language and a
macro F1-score of 0.8306 on Marathi data, which is 6% more from previous year.