BITS Faculty Publications

Permanent URI for this communityhttp://localhost:4000/handle/123456789/1867

Browse

Search Results

Now showing 1 - 10 of 99

A multi-modal attentive framework that can interpret text (MMAT)
(IEEE, 2025-07) Sharma, Yashvardhan
Deep learning algorithms have demonstrated exceptional performance on various computer vision and natural language processing tasks. However, for machines to learn information signals, they must understand and have enough reasoning power to respond to general questions based on the linguistic features present in images. Questions such as “What temperature is my oven set to?” need the models to understand objects in the images visually and then spatially identify the text associated with them. The existing Visual Question Answering model fails to recognize linguistic features present in the images, which is crucial for assisting the visually impaired. This paper aims to deal with the task of a visual question answering system that can do reasoning with text, optical character recognition (OCR), and visual modalities. The proposed Visual Question Answering model focuses on the image’s most relevant part by using an attention mechanism and passing all the features to the fusion encoder after getting pairwise attention, where the model is inclined toward the OCR-Linguistic features. The proposed model uses the dynamic pointer network instead of classification for iterative answer prediction with a focal loss function to overcome the class imbalance problem. On the TextVQA dataset, the proposed model obtains an accuracy of 46.8% and an average of 55.21% on the STVQA dataset. The results indicate the effectiveness of the proposed approach and suggest a Multi-Modal Attentive Framework that can learn individual text, object, and OCR features and then predict answers based on the text in the image.
Deep learning approaches for driver distraction detection using driver facing cameras: literature review and empirical study using cnn classifiers on a 100-driver image dataset
(2025-05) Bhatia, Ashutosh; Sharma, Yashvardhan; Tiwari, Kamlesh
Distracted driving contributes to thousands of fatalities and injuries globally. According to India’s Ministry of Road Transport and Highways (MoRTH), distraction-related behaviors such as rear-end and off-road collisions accounted for nearly one-fourth of all traffic incidents in 2022. The U.S. National Highway Traffic Safety Administration (NHTSA) reported 3,275 deaths and over 324,000 injuries from distraction-related crashes in 2023. In Europe, the European Road Safety Observatory (ERSO) observed handheld phone use by drivers in up to 9.4% of vehicles across member states, with self-reported texting rates reaching 53%. Despite numerous studies and surveys on driver distraction detection, existing literature remains fragmented, often combining multiple sensor modalities or distraction with related driver states such as fatigue. Prior empirical efforts also lack a unified benchmarking strategy to assess model generalization under shifts in viewpoint or spectral input. This paper presents a focused survey and empirical study of visiononly distraction detection using deep learning models applied to driver-facing camera inputs. It introduces a conceptual model linking behavioral cues to cognitive distraction, defines the visionbased Driver Distraction Detection (vDDD) system with alert logic, and develops structured taxonomies of datasets, architectures, and learning strategies. Using the 100-Driver dataset, the empirical study evaluates 26 CNN classifiers under 64 crossdomain configurations, systematically analyzing generalization across modality and camera view changes. Results show that frontal RGB-trained models generalize better than their NIRtrained counterparts and that lightweight models trade off accuracy under rare class scenarios for faster inference. The study establishes the vDDD paradigm as a vision-based behavioral modeling approach for distraction detection using driver-facing camera data. It outlines future research directions in spectrumaligned augmentation, attention modeling, and lightweight visuallanguage fusion, emphasizing deployment-focused strategies such as quantization, contrastive learning, and progressive fine-tuning.
TrPrNet: early Parkinson detection network using marker-less gait analysis
(Springer, 2025-04) Sharma, Yashvardhan; Bhatia, Ashutosh; Tiwari, Kamlesh
Parkinson’s disease is a progressive neurological disorder that significantly impairs motor functions, particularly gait . Early detection is essential for timely medical intervention and improving patient outcomes. In this paper, we introduce TrPrNet, a novel architecture for the early detection of Parkinson’s Disease that leverages a Transformer-based architecture. While previous studies have demonstrated the effectiveness of CNN and RNN-based models, they often fall short in capturing temporal dependencies within sequential data. TrPrNet addresses this limitation by utilizing self-attention mechanisms to understand complex relationships in time-sequenced body gait features, effectively capturing both short-term and long-term interactions. We evaluate TrPrNet against other RNN-based deep learning models such as LSTM and GRU, as well as various existing deep learning and machine learning approaches from previous researches. Using body keypoint based gait features extracted from gait sequences as input, our models are trained and tested on a meticulously curated dataset of gait videos. TrPrNet achieves performance, attaining 99.38% accuracy and a loss of 0.0001. These results underscore the potential of our Transformer-based architecture as a highly accurate, non-invasive tool for the early diagnosis of Parkinson’s Disease.
Composite Sequential Modeling for Identifying Fake Reviews
(De Gruyter, 2018-04) Sharma, Yashvardhan
This paper presents a comprehensive analysis and comparison of various proposed sequential models based on different deep networks such as the convolutional neural network, long short-term memory, and recurrent neural network. The different sequential models are analyzed based on the number of layers, the number of output dimensions, order, and the combination of different deep network architectures. The proposed approach is compared to a baseline model based on traditional machine learning techniques.
Neural Network-Based Architecture for Sentiment Analysis in Indian Languages
(De Gruyter, 2018-06) Sharma, Yashvardhan
Sentiment analysis refers to determining the polarity of the opinions represented by text. The paper proposes an approach to determine the sentiments of tweets in one of the Indian languages (Hindi, Bengali, and Tamil). Thirty-nine sequential models have been created using three different neural network layers [recurrent neural networks (RNNs), long short-term memory (LSTM), convolutional neural network (CNN)] with optimum parameter settings (to avoid over-fitting and error accumulation). These sequential models have been investigated for each of the three languages. The proposed sequential models are experimented to identify how the hidden layers affect the overall performance of the approach. A comparison has also been performed with existing approaches to find out if neural networks have an added advantage over traditional machine learning techniques.
Language Identification and Context-based Analysis of Code-switching Behaviors in Social Media Discussions
(IEEE, 2019) Sharma, Yashvardhan
Social media discussions see the participation of multilingual individuals: who tend to utilize alternate languages in a single post (code-switching) for effective communication in a discussion. This paper attempts to characterize such discussions to analyze contextual factors related to multilingual communities. Features extracted from the posts are used to train a CRF-based sequence labeling algorithm for language identification in an intra-sentential code-switching scenario. The context of a sentence in a discussion is modeled in defining relevance through Term Frequency Inverse Document Frequency (TF-IDF). Further context of a multilingual sentence with respect to the discussion such as agreement and questioning between pairs of posts is also modeled.
Deep Extractive Text Summarization
(Elsevier, 2020) Sharma, Yashvardhan
With introduction of deep learning techniques their has been an increase in intelligent classification of text in many applications. Advances in automatic text summarization using deep learning technique is prime focus of research now a days. Earlier traditional approaches for extractive text summarization have been heavily dependent on human engineered features. However, it is a laborious and tedious task. In this paper, a data-driven approach has been used to generate extractive summaries using deep learning. Approach proposed uses paraphrasing techniques to classify sentences as a candidate sentence for inclusion in summary or not.
Deep Text Summarization using Generative Adversarial Networks in Indian Languagess
(Elsevier, 2020) Sharma, Yashvardhan
Abstractive Text Summarization (ATS) is a task of capturing information from different sources and condense it such that, content is represented well and there is no loss of information. It has been an active area of research for quiet sometime now. ATS is more closer to human generated summaries and have the capability of representing and combining multiple information. With advent of deep learning architectures, many tasks relating to natural language processing have achieved persistent and comparable high performances. It has proven advantageous and showed promising results in machine translation, speech recognition, image captioning and many others using sequence to sequence models. Language tools such as Part of Speech taggers, Named Entity Recognizer for Indian languages are not very competitive and hence, language specific techniques do not perform very well for Indian languages. Deep learning techniques are language agnostic and hence can overcome these shortcomings. In this paper, Generative Adversarial Networks(GAN(s)) are assimilated to create gist for longer piece of text in conjunction to paraphrase detection.
Bits2020@ Dravidian-CodeMix-FIRE2020: Sub-Word Level Sentiment Analysis of Dravidian Code Mixed Data
(CEUR-WS, 2020) Sharma, Yashvardhan
This paper presents the methodologies implemented while classifying Dravidian code-mixed comments according to their polarity in the evaluation of the track ‘Sentiment Analysis for Davidian Languages in Code-Mixed Text’ proposed by Forum of Information Retrieval Evaluation in 2020. The implemented method used a sub-word level representation to capture the sentiment of the text. Using a Long Short Term Memory (LSTM) network along with language-specific preprocessing, the model classified the text according to its polarity. With F1-scores of 0.61 and 0.60, the model achieved an overall rank of 5 and 12 in the Tamil and Malayalam tasks respectively.
Character aware models with similarity learning for metaphor detection
(Association for Computational Linguistics (ACL), 2020) Sharma, Yashvardhan
Recent work on automatic sequential metaphor detection has involved recurrent neural networks initialized with different pre-trained word embeddings and which are sometimes combined with hand engineered features. To capture lexical and orthographic information automatically, in this paper we propose to add character based word representation. Also, to contrast the difference between literal and contextual meaning, we utilize a similarity network. We explore these components via two different architectures - a BiLSTM model and a Transformer Encoder model similar to BERT to perform metaphor identification. We participate in the Second Shared Task on Metaphor Detection on both the VUA and TOFEL datasets with the above models. The experimental results demonstrate the effectiveness of our method as it outperforms all the systems which participated in the previous shared task.

BITS Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results