Department of Computer Science and Information Systems
Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1928
Browse
4 results
Search Results
Item Detection of Threat Records by Analyzing the Tweets in Urdu Language Exploring Deep Learning Transformer - Based Models(CEUR-WS, 2021) Sharma, YashvardhanAs humans, we express sadness, anger, happiness, frustration, bullying, etc., in both physical and virtual worlds. In the virtual world, i.e., social media, we use textual ways to express ourselves. Due to the lack of offensive and threatening language detection mechanisms aggressive behavior in social media is not always followed by an immediate consequence. But the impact of these posts on the victim can cause prolonged mental illness and instigate fear for social media platforms. This paper aims to identify threatening posts using deep learning transformer-based models such as Roberta. The Urdu tweet dataset used in this study has been provided by HASOC-2021 which aims to identify Hate speech and offensive remarks without human assistance. We submitted our model in its subtask B of the 4th subtrack(Abusive and Threatening language detection in Urdu), secured 2nd position on the public leaderboard, and obtained Weighted f1 of 0.5346 and ROC AUC of 0. 8199.Item Applying Transfer Learning using BERT-Based Models for Hate Speech Detection(CEUR-WS, 2021) Sharma, Yashvardhan; Chauhan, Gajendra SinghHateful and Offensive speech is rising along with social media. This issue has motivated researchers to devise novel approaches which perform better than the traditional algorithms. This paper presents the methods adopted by the BITS Pilani team for Subtask 1A of the Hate Speech and Offensive Content Identification in English and Indo-Aryan Language task proposed by the Forum of Information Retrieval Evaluation in 2021. We have used data augmentation to make the models generalize better. We have experimented with different feature extraction techniques along with machine learning algorithms. But, fine-tuning the pre-trained BERT-based models using transfer learning gave us the best results for all the given languages on the test set. We got the highest Macro-F1 of 0.7993 for the English Language, 0.7612 for the Hindi Language, and 0.8306 for the Marathi Language using the pre-trained BERT-based models.Item Legal Text Classification and Summarization using Transformers and Joint Text Features(CEUR-WS, 2021) Sharma, YashvardhanThe spread of misinformation has become a severe issue affecting society. Inaccurate information has enormous potential to cause real-world impacts. Developing algorithms to detect fake news automatically will be very useful in preventing unnecessary panic and damage caused by rumors. This fake news problem is present for all languages, and it becomes crucial to solve it for languages other than English, with scarce datasets. This paper aims to tackle the problem of automatic fake news detection in Urdu, a low-resource language. FIRE-2021 has provided the Urdu dataset used in this paper. We fine-tuned monolingual and multilingual transformers. After searching for hyperparameters, we tried ensembling our models. We submitted our model for the UrduFake task, and it achieved an accuracy of 0.596 and an F1- macro score of 0.449.Item Ensembling of Various Transformer Based Models for the Fake News Detection Task in the Urdu Language(CEUR-WS, 2021) Sharma, Yashvardhan; Chauhan, Gajendra SinghThe spread of misinformation has become a severe issue affecting society. Inaccurate information has enormous potential to cause real-world impacts. Developing algorithms to detect fake news automatically will be very useful in preventing unnecessary panic and damage caused by rumors. This fake news problem is present for all languages, and it becomes crucial to solve it for languages other than English, with scarce datasets. This paper aims to tackle the problem of automatic fake news detection in Urdu, a low-resource language. FIRE-2021 has provided the Urdu dataset used in this paper. We fine-tuned monolingual and multilingual transformers. After searching for hyperparameters, we tried ensembling our models. We submitted our model for the UrduFake task, and it achieved an accuracy of 0.596 and an F1- macro score of 0.449.