Department of Computer Science and Information Systems

Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1928

Browse

Search Results

Now showing 1 - 10 of 11

A multi-modal attentive framework that can interpret text (MMAT)
(IEEE, 2025-07) Sharma, Yashvardhan
Deep learning algorithms have demonstrated exceptional performance on various computer vision and natural language processing tasks. However, for machines to learn information signals, they must understand and have enough reasoning power to respond to general questions based on the linguistic features present in images. Questions such as “What temperature is my oven set to?” need the models to understand objects in the images visually and then spatially identify the text associated with them. The existing Visual Question Answering model fails to recognize linguistic features present in the images, which is crucial for assisting the visually impaired. This paper aims to deal with the task of a visual question answering system that can do reasoning with text, optical character recognition (OCR), and visual modalities. The proposed Visual Question Answering model focuses on the image’s most relevant part by using an attention mechanism and passing all the features to the fusion encoder after getting pairwise attention, where the model is inclined toward the OCR-Linguistic features. The proposed model uses the dynamic pointer network instead of classification for iterative answer prediction with a focal loss function to overcome the class imbalance problem. On the TextVQA dataset, the proposed model obtains an accuracy of 46.8% and an average of 55.21% on the STVQA dataset. The results indicate the effectiveness of the proposed approach and suggest a Multi-Modal Attentive Framework that can learn individual text, object, and OCR features and then predict answers based on the text in the image.
Deep Extractive Text Summarization
(Elsevier, 2020) Sharma, Yashvardhan
With introduction of deep learning techniques their has been an increase in intelligent classification of text in many applications. Advances in automatic text summarization using deep learning technique is prime focus of research now a days. Earlier traditional approaches for extractive text summarization have been heavily dependent on human engineered features. However, it is a laborious and tedious task. In this paper, a data-driven approach has been used to generate extractive summaries using deep learning. Approach proposed uses paraphrasing techniques to classify sentences as a candidate sentence for inclusion in summary or not.
Deep Text Summarization using Generative Adversarial Networks in Indian Languagess
(Elsevier, 2020) Sharma, Yashvardhan
Abstractive Text Summarization (ATS) is a task of capturing information from different sources and condense it such that, content is represented well and there is no loss of information. It has been an active area of research for quiet sometime now. ATS is more closer to human generated summaries and have the capability of representing and combining multiple information. With advent of deep learning architectures, many tasks relating to natural language processing have achieved persistent and comparable high performances. It has proven advantageous and showed promising results in machine translation, speech recognition, image captioning and many others using sequence to sequence models. Language tools such as Part of Speech taggers, Named Entity Recognizer for Indian languages are not very competitive and hence, language specific techniques do not perform very well for Indian languages. Deep learning techniques are language agnostic and hence can overcome these shortcomings. In this paper, Generative Adversarial Networks(GAN(s)) are assimilated to create gist for longer piece of text in conjunction to paraphrase detection.
Legal Text Classification and Summarization using Transformers and Joint Text Features
(CEUR-WS, 2021) Sharma, Yashvardhan
The spread of misinformation has become a severe issue affecting society. Inaccurate information has enormous potential to cause real-world impacts. Developing algorithms to detect fake news automatically will be very useful in preventing unnecessary panic and damage caused by rumors. This fake news problem is present for all languages, and it becomes crucial to solve it for languages other than English, with scarce datasets. This paper aims to tackle the problem of automatic fake news detection in Urdu, a low-resource language. FIRE-2021 has provided the Urdu dataset used in this paper. We fine-tuned monolingual and multilingual transformers. After searching for hyperparameters, we tried ensembling our models. We submitted our model for the UrduFake task, and it achieved an accuracy of 0.596 and an F1- macro score of 0.449.
Ensembling of Various Transformer Based Models for the Fake News Detection Task in the Urdu Language
(CEUR-WS, 2021) Sharma, Yashvardhan; Chauhan, Gajendra Singh
The spread of misinformation has become a severe issue affecting society. Inaccurate information has enormous potential to cause real-world impacts. Developing algorithms to detect fake news automatically will be very useful in preventing unnecessary panic and damage caused by rumors. This fake news problem is present for all languages, and it becomes crucial to solve it for languages other than English, with scarce datasets. This paper aims to tackle the problem of automatic fake news detection in Urdu, a low-resource language. FIRE-2021 has provided the Urdu dataset used in this paper. We fine-tuned monolingual and multilingual transformers. After searching for hyperparameters, we tried ensembling our models. We submitted our model for the UrduFake task, and it achieved an accuracy of 0.596 and an F1- macro score of 0.449.
Anaphora Resolution from Social Media Text
(CEUR-WS, 2022) Sharma, Yashvardhan
Anaphora resolution for social media texts is essential yet difficult task for text understanding. An important characteristic of anaphora is that it creates a connection between the antecedent and the anaphor buried in the anaphoric sentence. This paper outlines the methods used to locate anaphora and their antecedents in a particular text. The text is a social media tweet for the SocAnaRes-IL 2022 challenge that was part of FIRE 2022. The proposed model uses a Neural Co-reference Network for the anaphora resolution
Comparative Analysis of Various Machine Learning Based Techniques for Predicting the Virality of Tweets
(IEEE, 2022) Sharma, Yashvardhan
Social media has become more popular, and people tend to read the news more often from it than traditional media. But all the information that is posted on the social media platform might not go viral. In this paper, we have analyzed the data from one of the social media platforms, Twitter, and established a few reasons for the virality of tweets. Along with it, given the tweet information and user details to the trained model, we could predict whether the tweets go viral or not. For this, we used multiple architectures from classical machine learning like Random Forest, XGBoost and Lightgbm and Convolutions from Deep Learning and got the highest accuracy using the Lightgbm model. The results show that using both text and image data combined provides better results when compared with using only text or images (unimodal data). The data used is from the competition with full user details, tweet information, and tweet text and image.
Visual Question Answering Analysis: Datasets, Methods, and Image Featurization Techniques
(ICPRAM, 2023) Sharma, Yashvardhan
Holistic scene understanding is a long-standing objective of core tenets of Artificial Intelligence (AI). Multimodal tasks that aim to synergize capabilities spanning multiple domains, such as visual-linguistic capabilities, into intelligent systems are thus a desideratum for the next step in AI. Visual Question Answering (VQA) systems that integrate Computer Vision and Natural Language Processing tasks into the task of answering natural language questions about an image represent one such domain. There is a need to explore Deep Learning techniques that can help to improve such systems beyond the language biases of real-world priors that presently hinder them from serving as a veritable touchstone for holistic scene understanding. Furthermore, the effectiveness of Transformer architecture for the image featurization pipeline of VQA systems remains untested. Hence, an exhaustive study on the performance of various model architectures with varied training conditions on VQA datasets like VizWiz and VQA v2 is imperative to further this area of research. This study explores architectures that utilize image and question co-attention for the task of VQA and several CNN architectures, including ResNet, VGG, EfficientNet, and DenseNet. Vision Transformer architecture is also explored for image featurization, and a myriad of loss functions such as cross-entropy, focal loss, and UniLoss are employed for training the models. Finally, the trained model is deployed using Flask, and a GUI for the same has been implemented that lets users input an image and accompanying questions about the image to generate an answer in response.
Automatic Subjective Answer Evaluation
(ICPRAM, 2023) Sharma, Yashvardhan
The evaluation of answer scripts is vital for assessing a student’s performance. The manual evaluation of the answers can sometimes be biased. The assessment depends on various factors, including the evaluator’s mental state, their relationship with the student, and their level of expertise in the subject matter. These factors make evaluating descriptive answers a very tedious and time-consuming task. Automatic scoring approaches can be utilized to simplify the evaluation process. This paper presents an automated answer script evaluation model that intends to reduce the need for human intervention, minimize bias brought on by evaluator psychological changes, save time, maintain track of evaluations, and simplify extraction. The proposedmethod can automatically weigh the assessing element and produce results nearly identical to an instructor’s. We compared the model’s grades to the grades of the teacher, as well as the results of several keyword matching and similarity check techniques, in order to evaluate the developed model
FakeRevealer: A Multimodal Framework for Revealing the Falsity of Online Tweets Using Transformer-Based Architectures
(Scitepress, 2023) Sharma, Yashvardhan; Chauhan, Gajendra Singh
As the Internet has evolved, the exposure and widespread adoption of social media concepts have altered the way news is formed and published. With the help of social media, getting news is cheaper, faster, and easier. However, this has also led to an increase in the number of fake news articles, either by manipulating the text or morphing the images. The spread of fake news has become a serious issue all over the world. In one case, at least 20 people were killed just because of false information that was circulated over a social media platform. This makes it clear that social media sites need a system that uses more than one method to spot fake news stories. To solve this problem, we’ve come up with FakeRevealer, a single-configuration fake news detection system that works on transfer learning based techniques. Our multi-modal archutecture understands the textual features using a language transformer model called DistilRoBERTa and image features are extracted using the Vision Transf ormer (ViTs) that is pre-trained on ImageNet 21K. After feature extraction, a cosine similarity measure is used to fuse both the features. The evaluation of our proposed framework is done over publicly available twitter dataset and results shows that it outperforms current state-of-art on twitter dataset with an accuracy of 80.00% which is 2.23%more, that than the current state-of-art on twitter dataset

Department of Computer Science and Information Systems

Browse

Filters

Settings

Sort By

Results per page

Search Results