Department of Computer Science and Information Systems

Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1928

Browse

Search Results

Now showing 1 - 10 of 64
  • Item
    A multi-modal attentive framework that can interpret text (MMAT)
    (IEEE, 2025-07) Sharma, Yashvardhan
    Deep learning algorithms have demonstrated exceptional performance on various computer vision and natural language processing tasks. However, for machines to learn information signals, they must understand and have enough reasoning power to respond to general questions based on the linguistic features present in images. Questions such as “What temperature is my oven set to?” need the models to understand objects in the images visually and then spatially identify the text associated with them. The existing Visual Question Answering model fails to recognize linguistic features present in the images, which is crucial for assisting the visually impaired. This paper aims to deal with the task of a visual question answering system that can do reasoning with text, optical character recognition (OCR), and visual modalities. The proposed Visual Question Answering model focuses on the image’s most relevant part by using an attention mechanism and passing all the features to the fusion encoder after getting pairwise attention, where the model is inclined toward the OCR-Linguistic features. The proposed model uses the dynamic pointer network instead of classification for iterative answer prediction with a focal loss function to overcome the class imbalance problem. On the TextVQA dataset, the proposed model obtains an accuracy of 46.8% and an average of 55.21% on the STVQA dataset. The results indicate the effectiveness of the proposed approach and suggest a Multi-Modal Attentive Framework that can learn individual text, object, and OCR features and then predict answers based on the text in the image.
  • Item
    Deep learning approaches for driver distraction detection using driver facing cameras: literature review and empirical study using cnn classifiers on a 100-driver image dataset
    (2025-05) Bhatia, Ashutosh; Sharma, Yashvardhan; Tiwari, Kamlesh
    Distracted driving contributes to thousands of fatalities and injuries globally. According to India’s Ministry of Road Transport and Highways (MoRTH), distraction-related behaviors such as rear-end and off-road collisions accounted for nearly one-fourth of all traffic incidents in 2022. The U.S. National Highway Traffic Safety Administration (NHTSA) reported 3,275 deaths and over 324,000 injuries from distraction-related crashes in 2023. In Europe, the European Road Safety Observatory (ERSO) observed handheld phone use by drivers in up to 9.4% of vehicles across member states, with self-reported texting rates reaching 53%. Despite numerous studies and surveys on driver distraction detection, existing literature remains fragmented, often combining multiple sensor modalities or distraction with related driver states such as fatigue. Prior empirical efforts also lack a unified benchmarking strategy to assess model generalization under shifts in viewpoint or spectral input. This paper presents a focused survey and empirical study of visiononly distraction detection using deep learning models applied to driver-facing camera inputs. It introduces a conceptual model linking behavioral cues to cognitive distraction, defines the visionbased Driver Distraction Detection (vDDD) system with alert logic, and develops structured taxonomies of datasets, architectures, and learning strategies. Using the 100-Driver dataset, the empirical study evaluates 26 CNN classifiers under 64 crossdomain configurations, systematically analyzing generalization across modality and camera view changes. Results show that frontal RGB-trained models generalize better than their NIRtrained counterparts and that lightweight models trade off accuracy under rare class scenarios for faster inference. The study establishes the vDDD paradigm as a vision-based behavioral modeling approach for distraction detection using driver-facing camera data. It outlines future research directions in spectrumaligned augmentation, attention modeling, and lightweight visuallanguage fusion, emphasizing deployment-focused strategies such as quantization, contrastive learning, and progressive fine-tuning.
  • Item
    TrPrNet: early Parkinson detection network using marker-less gait analysis
    (Springer, 2025-04) Sharma, Yashvardhan; Bhatia, Ashutosh; Tiwari, Kamlesh
    Parkinson’s disease is a progressive neurological disorder that significantly impairs motor functions, particularly gait . Early detection is essential for timely medical intervention and improving patient outcomes. In this paper, we introduce TrPrNet, a novel architecture for the early detection of Parkinson’s Disease that leverages a Transformer-based architecture. While previous studies have demonstrated the effectiveness of CNN and RNN-based models, they often fall short in capturing temporal dependencies within sequential data. TrPrNet addresses this limitation by utilizing self-attention mechanisms to understand complex relationships in time-sequenced body gait features, effectively capturing both short-term and long-term interactions. We evaluate TrPrNet against other RNN-based deep learning models such as LSTM and GRU, as well as various existing deep learning and machine learning approaches from previous researches. Using body keypoint based gait features extracted from gait sequences as input, our models are trained and tested on a meticulously curated dataset of gait videos. TrPrNet achieves performance, attaining 99.38% accuracy and a loss of 0.0001. These results underscore the potential of our Transformer-based architecture as a highly accurate, non-invasive tool for the early diagnosis of Parkinson’s Disease.
  • Item
    Deep Extractive Text Summarization
    (Elsevier, 2020) Sharma, Yashvardhan
    With introduction of deep learning techniques their has been an increase in intelligent classification of text in many applications. Advances in automatic text summarization using deep learning technique is prime focus of research now a days. Earlier traditional approaches for extractive text summarization have been heavily dependent on human engineered features. However, it is a laborious and tedious task. In this paper, a data-driven approach has been used to generate extractive summaries using deep learning. Approach proposed uses paraphrasing techniques to classify sentences as a candidate sentence for inclusion in summary or not.
  • Item
    Deep Text Summarization using Generative Adversarial Networks in Indian Languagess
    (Elsevier, 2020) Sharma, Yashvardhan
    Abstractive Text Summarization (ATS) is a task of capturing information from different sources and condense it such that, content is represented well and there is no loss of information. It has been an active area of research for quiet sometime now. ATS is more closer to human generated summaries and have the capability of representing and combining multiple information. With advent of deep learning architectures, many tasks relating to natural language processing have achieved persistent and comparable high performances. It has proven advantageous and showed promising results in machine translation, speech recognition, image captioning and many others using sequence to sequence models. Language tools such as Part of Speech taggers, Named Entity Recognizer for Indian languages are not very competitive and hence, language specific techniques do not perform very well for Indian languages. Deep learning techniques are language agnostic and hence can overcome these shortcomings. In this paper, Generative Adversarial Networks(GAN(s)) are assimilated to create gist for longer piece of text in conjunction to paraphrase detection.
  • Item
    Bits2020@ Dravidian-CodeMix-FIRE2020: Sub-Word Level Sentiment Analysis of Dravidian Code Mixed Data
    (CEUR-WS, 2020) Sharma, Yashvardhan
    This paper presents the methodologies implemented while classifying Dravidian code-mixed comments according to their polarity in the evaluation of the track ‘Sentiment Analysis for Davidian Languages in Code-Mixed Text’ proposed by Forum of Information Retrieval Evaluation in 2020. The implemented method used a sub-word level representation to capture the sentiment of the text. Using a Long Short Term Memory (LSTM) network along with language-specific preprocessing, the model classified the text according to its polarity. With F1-scores of 0.61 and 0.60, the model achieved an overall rank of 5 and 12 in the Tamil and Malayalam tasks respectively.
  • Item
    Character aware models with similarity learning for metaphor detection
    (Association for Computational Linguistics (ACL), 2020) Sharma, Yashvardhan
    Recent work on automatic sequential metaphor detection has involved recurrent neural networks initialized with different pre-trained word embeddings and which are sometimes combined with hand engineered features. To capture lexical and orthographic information automatically, in this paper we propose to add character based word representation. Also, to contrast the difference between literal and contextual meaning, we utilize a similarity network. We explore these components via two different architectures - a BiLSTM model and a Transformer Encoder model similar to BERT to perform metaphor identification. We participate in the Second Shared Task on Metaphor Detection on both the VUA and TOFEL datasets with the above models. The experimental results demonstrate the effectiveness of our method as it outperforms all the systems which participated in the previous shared task.
  • Item
    Siva@ HASOC-Dravidian-CodeMix-FIRE-2020: Multilingual Offensive Speech Detection in Code-mixed and Romanized Text
    (CEUR-WS, 2020) Sharma, Yashvardhan
    Detecting and eliminating offensive and hate speech in social media content is an important concern as hate and offensive speech can have serious consequences in society ranging from ill-education among youth to hate crimes. Offensive speech identification in countries like India poses several additional challenges due to the usage of code-mixed and romanized variants of multiple languages by the users in their posts on social media. HASOC-Dravidian-CodeMix - FIRE 2020 extended the task of offensive speech identification to Dravidian languages. In this paper, we describe our approach in HASOC Dravidian Code-mixed 2020, which topped two out of three tasks(F1-weighted scores - 0.95 and 0.90) and stood second in the third task lagging the top model only by 0.01 points((F1-weighted score - 0.77). We propose a novel and flexible approach of selective translation and transliteration to be able to reap better results out of fine-tuning and ensembling multilingual transformer networks like XLM-RoBERTa and mBERT. Further, we implemented pre-trained, fine-tuned and ensembled versions of XLM-RoBERTa for offensive speech classification. We open source our work to facilitate further experimentation.
  • Item
    Combating Online Hate: A Comparative Study on Identification of Hate Speech and Offensive Content in Social Media Text
    (IEEE, 2020) Sharma, Yashvardhan
    This paper addresses the important issue of rising hate and offensive comments against individuals or communities on social media. Such behaviour has become pervasive in social media where people are easily able to vent out their hatred and reach out to a large number of people, which they may not consider in the physical world. One of the most effective solution for tackling this enigmatic problem is the use of computational techniques to identify such hateful and offensive content and to take action against it. The current work focuses on detecting hate speech and offensive content in Indo-European languages keeping English on the frontline since it is the most widely used language on the Internet. The datasets used for the experiment are obtained from CrowdFlower and FIRE-2019 task on Identifying Hate Speech and Offensive Content in Social Media Text (HASOC). The paper provides a comparative analysis and explores the effectiveness of the TF-IDF approach and various word embedding-based approaches for the classification task on both the datasets. The evaluation measures are accuracy, precision, recall and F1-score.
  • Item
    Text-Convolutional Neural Networks for Fake News Detection in Tweets
    (Springer, 2020-09) Sharma, Yashvardhan
    With the widespread use of online social networking websites, user-generated stories and social network platform have become critical in news propagation. The Web portals are being used to mislead users for political gains. Unreliable information is being shared without any fact-checking. Therefore, there is a dire need for automatic news verification system which can help journalists and the common users from misleading content. In this work, the task is defined as being able to classify a tweet as real or fake. The complexity of natural language constructs along with variegated languages makes this task very challenging. In this work, a deep learning model to learn semantic word embeddings is proposed to handle this complexity. The evaluations on the benchmark dataset (VMU 2015) show that deep learning methods are superior to traditional natural language processing algorithms