Abstract:
This paper addresses the important issue of rising hate and offensive comments against individuals or communities on social media. Such behaviour has become pervasive in social media where people are easily able to vent out their hatred and reach out to a large number of people, which they may not consider in the physical world. One of the most effective solution for tackling this enigmatic problem is the use of computational techniques to identify such hateful and offensive content and to take action against it. The current work focuses on detecting hate speech and offensive content in Indo-European languages keeping English on the frontline since it is the most widely used language on the Internet. The datasets used for the experiment are obtained from CrowdFlower and FIRE-2019 task on Identifying Hate Speech and Offensive Content in Social Media Text (HASOC). The paper provides a comparative analysis and explores the effectiveness of the TF-IDF approach and various word embedding-based approaches for the classification task on both the datasets. The evaluation measures are accuracy, precision, recall and F1-score.