BITS Faculty Publications

Permanent URI for this communityhttp://localhost:4000/handle/123456789/1867

Browse

Search Results

Now showing 1 - 8 of 8
  • Item
    Text-Convolutional Neural Networks for Fake News Detection in Tweets
    (Springer, 2020-09) Sharma, Yashvardhan
    With the widespread use of online social networking websites, user-generated stories and social network platform have become critical in news propagation. The Web portals are being used to mislead users for political gains. Unreliable information is being shared without any fact-checking. Therefore, there is a dire need for automatic news verification system which can help journalists and the common users from misleading content. In this work, the task is defined as being able to classify a tweet as real or fake. The complexity of natural language constructs along with variegated languages makes this task very challenging. In this work, a deep learning model to learn semantic word embeddings is proposed to handle this complexity. The evaluations on the benchmark dataset (VMU 2015) show that deep learning methods are superior to traditional natural language processing algorithms
  • Item
    Irony Detection in Non-English Tweets
    (IEEE, 2021) Sharma, Yashvardhan
    Sentiment analysis is the interpretation and classification of emotions conveyed by text data. While there have been many attempts to classify the sentiment of a given text, there have been few models that can do the same when provided with non-English data exhibiting sarcasm or irony. This paper aims to compare various techniques of sarcasm detection and decide which method works the best for datasets of different sizes and types. The models have been tested on datasets of three different non-English languages - Arabic, French and a Hindi-English code-mix. None of the presented models are language-specific and can be run on data of any language. A comparison between a sub-word model, the usage of Term Frequency-Inverse Document Frequency (TF-IDF) and neural networks, a Long Short-Term Memory (LSTM) model and machine learning techniques such as Nearest Neighbors, Decision Tree, Random Forest, AdaBoost, Naive Bayes (NB), Support Vector Machine (SVM) Linear, SVM radial basis function (RBF), SVM Sigmoid has been performed. The output for each language and model has been evaluated based on their F1-score, accuracy, precision, and recall.
  • Item
    FakeRevealer: A Multimodal Framework for Revealing the Falsity of Online Tweets Using Transformer-Based Architectures
    (Scitepress, 2023) Sharma, Yashvardhan; Chauhan, Gajendra Singh
    As the Internet has evolved, the exposure and widespread adoption of social media concepts have altered the way news is formed and published. With the help of social media, getting news is cheaper, faster, and easier. However, this has also led to an increase in the number of fake news articles, either by manipulating the text or morphing the images. The spread of fake news has become a serious issue all over the world. In one case, at least 20 people were killed just because of false information that was circulated over a social media platform. This makes it clear that social media sites need a system that uses more than one method to spot fake news stories. To solve this problem, we’ve come up with FakeRevealer, a single-configuration fake news detection system that works on transfer learning based techniques. Our multi-modal archutecture understands the textual features using a language transformer model called DistilRoBERTa and image features are extracted using the Vision Transf ormer (ViTs) that is pre-trained on ImageNet 21K. After feature extraction, a cosine similarity measure is used to fuse both the features. The evaluation of our proposed framework is done over publicly available twitter dataset and results shows that it outperforms current state-of-art on twitter dataset with an accuracy of 80.00% which is 2.23%more, that than the current state-of-art on twitter dataset
  • Item
    Impact of Transformer-Based Models and User Clustering in Early Fake News Detection in Social Media
    (Scitepress, 2023) Sharma, Yashvardhan; Chauhan, Gajendra Singh
    People are now consuming news on social media platforms rather than through traditional sources as a result of easy access to the internet. This has allowed for the recent rise in the online dissemination of false information. The spread of false information seriously damages people’s reputations and the public’s trust in them. The research community has recently given fake news identification a great deal of attention, and prior studies have mainly concentrated on finding hints in news content or diffusion graphs. The older models, on the other hand, didn’t have the key features needed to spot fake news quickly. We focus on finding fake news by using features that are available when it is just starting to spread. The current work suggests a new framework made up of content-based features taken from news articles and social-context features taken from user characteristics and responses at the sentence level. In addition, we extend our approach to Transformer-based models and leverage user clustering to demonstrate a considerable performance gain over the original model.
  • Item
    Steno AI at SemEval-2023 Task 6: Rhetorical Role Labelling of Legal Documents using Transformers and Graph Neural Networks
    (Association for Computational Linguistics, 2023) Sharma, Yashvardhan
    A legal document is usually long and dense requiring human effort to parse it. It also contains significant amounts of jargon which make deriving insights from it using existing models a poor approach. This paper presents the approaches undertaken to perform the task of rhetorical role labelling on Indian Court Judgements as part of SemEval Task 6: understanding legal texts, shared subtask A (Modi et al., 2023). We experiment with graph based approaches like Graph Convolutional Networks and Label Propagation Algorithm, and transformer-based approaches including variants of BERT to improve accuracy scores on text classification of complex legal documents.
  • Item
    Deep Learning Approaches for Question Answering System
    (Elsevier, 2018) Sharma, Yashvardhan
    Question Answering (QA) System is very useful as most of the deep learning related problems can be modeled as a question answering problem. Consequently, the field is one of the most researched fields in computer science today. The last few years have seen considerable developments and improvement in the state of the art, much of which can be credited to upcoming of Deep Learning. In this paper, a discussion about various approaches starting from the basic NLP and algorithms based approach has been done and the paper eventually builds towards the recently proposed methods of Deep Learning. Implementation details and various tweaks in the algorithms that produced better results have also been discussed. The evaluation of the proposed models was done on twenty tasks of babI dataset of Facebook.
  • Item
    Encoder-Decoder Architectures for Generating Questions
    (Elsevier, 2018) Sharma, Yashvardhan
    With exploding textual data on the internet with e-books, legal documents and products information, it is an opportunity to harness it for applications which can aid human tasks. Developing systems for question generation can be used for making frequently-asked-questions, creating school quiz-es and serve for the purpose of unified AI. Here in this study various encoder decoder architectures for generating questions from text inputs have been explored using Stanford’s SQuAD dataset as for training development and test sets and evaluation metrics such as BLEU, ROUGUE and training time were used to compare the effectiveness of the models. The article develops upon the work of current end-to-end system by using gated recurrent unit in place of long short term memory which give similar accuracy but with lesser training time, further it also show the successfully use of a convolution based encoder for this task which gives results comparable to current state of the art system with much lesser training time.
  • Item
    Bits_Pilani@INLI-FIRE-2017:Indian Native Language Identification using Deep Learning
    (CEUR, 2017) Sharma, Yashvardhan
    The task of Native Language Identification involves identifying the prior or first learnt language of a user based on his writing technique and/or analysis of speech and phonetics in second language. There is a surplus of such data present on social media sites and organised dataset from bodies like Educational Testing Service(ETS), which can be exploited to develop language learning systems and forensic linguistics. In this paper we propose a deep neural network for this task using hierarchical paragraph encoder with attention mechanism to identify relevant features over tendencies and errors a user makes with second language for the INLI task in FIRE 2017. The task involves six Indian languages as prior/native set and english as the second language which has been collected from user's social media account.