Department of Computer Science and Information Systems

Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1928

Browse

Search Results

Now showing 1 - 10 of 56

Leveraging dual encoders with feature disentanglement for anomaly detection in thermal videos
(Springer, 2024-12) Goyal, Poonam
Anomaly detection is critical for real-time applications, e.g., monitoring elderly people or kids from a remote place; gas leakage detection, night vision surveillance, etc. Detecting anomalous behavior becomes even more challenging when the device used for capturing scenes is the thermal camera. The thermal videos have the ability to preserve the identity of the subjects involved in the scenes. The info-deficit nature of thermal imagery, i.e., lack of texture, contours, and colors, makes it difficult to fetch the salient details required to differentiate between normal and abnormal events. Most approaches for anomaly detection in videos explicitly model regions of interest (ROIs). However, this modeling poses limitations of accurate RoI detection more in thermal videos when the size of ROIs is smaller than the size of the frame. Moreover, the techniques, that take advantage of corresponding visible videos to detect anomalies in thermal videos, have a limitation of requiring twin videos. To address these limitations, we present a frame-level unsupervised approach that learns two sets of features from two different encoders in a disentangled fashion. The learning objectives of the proposed approach is aggregation of reconstruction error of the middle frame and disentanglement error between two encodings. We perform extensive experiments on two benchmark thermal video datasets, Thermal Rare Event and TSF. The proposed approach outperforms state-of-the-art models for anomaly detection from visible and thermal spectrum.
AdQuestA: knowledge-guided visual question answer framework for advertisements
(IEEE, 2025) Goyal, Poonam
In the rapidly evolving landscape of digital marketing, effective customer engagement through advertisements is crucial for brands. Thus, computational understanding of ads is pivotal for recommendation, authoring, and customer behaviour simulation. Despite advancements in knowledge-guided visual-question-answering (VQA) models, existing frameworks often lack domain-specific responses and suffer from a dearth of benchmark datasets for advertisements. To address this gap, we introduce ADVQA, the first dataset for ad-related VQA sourced from Facebook and X (twitter), which facilitates further research in ad comprehension. It comprises open-ended questions and detailed context obtained automatically from web articles. Moreover, we present AdQuestA, a novel multimodal framework for knowledge-guided open-ended question-answering tailored to advertisements. AdQuestA leverages a Retrieval Augmented Generation (RAG) to obtain question-aware ad context as explicit knowledge and image-grounded implicit knowledge, effectively exploiting inherent relationships for reasoning. Extensive experiments corroborate its efficacy, yielding state-of-the-art performance on the AD-VQA dataset, even surpassing 10X larger models such as GPT-4 on this task. Our framework not only enhances understanding of ad content but also advances the broader landscape of knowledge-guided VQA models.
On the Universality of Deep Contextual Language Models
(2021-12) Goyal, Poonam
Deep Contextual Language Models (LMs) like ELMO, BERT, and their successors dominate the landscape of Natural Language Processing due to their ability to scale across multiple tasks rapidly by pre-training a single model, followed by task-specific fine-tuning. Furthermore, multilingual versions of such models like XLM-R and mBERT have given promising results in zero-shot cross-lingual transfer, potentially enabling NLP applications in many under-served and under-resourced languages. Due to this initial success, pre-trained models are being used as `Universal Language Models' as the starting point across diverse tasks, domains, and languages. This work explores the notion of `Universality' by identifying seven dimensions across which a universal model should be able to scale, that is, perform equally well or reasonably well, to be useful across diverse settings. We outline the current theoretical and empirical results that support model performance across these dimensions, along with extensions that may help address some of their current limitations. Through this survey, we lay the foundation for understanding the capabilities and limitations of massive contextual language models and help discern research gaps and directions for future work to make these LMs inclusive and fair to diverse applications, users, and linguistic phenomena.
bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for Detecting Homophobia and Transphobia in Social Media Comments
(2022) Goyal, Poonam
Online social networks are ubiquitous and user-friendly. Nevertheless, it is vital to detect and moderate offensive content to maintain decency and empathy. However, mining social media texts is a complex task since users don't adhere to any fixed patterns. Comments can be written in any combination of languages and many of them may be low-resource. In this paper, we present our system for the LT-EDI shared task on detecting homophobia and transphobia in social media comments. We experiment with a number of monolingual and multilingual transformer based models such as mBERT along with a data augmentation technique for tackling class imbalance. Such pretrained large models have recently shown tremendous success on a variety of benchmark tasks in natural language processing. We observe their performance on a carefully annotated, real life dataset of YouTube comments in English as well as Tamil. Our submission achieved ranks 9, 6 and 3 with a macro-averaged F1-score of 0.42, 0.64 and 0.58 in the English, Tamil and Tamil-English subtasks respectively. The code for the system has been open sourced.
CranGAN: Adversarial Point Cloud Reconstruction for patient-specific Cranial Implant Design
(IEEE, 2022) Goyal, Poonam
Automatizing cranial implant design has become an increasingly important avenue in biomedical research. Benefits in terms of financial resources, time and patient safety necessitate the formulation of an efficient and accurate procedure for the same. This paper attempts to provide a new research direction to this problem, through an adversarial deep learning solution. Specifically, in this work, we present CranGAN - a 3D Conditional Generative Adversarial Network designed to reconstruct a 3D representation of a complete skull given its defective counterpart. A novel solution of employing point cloud representations instead of conventional 3D meshes and voxel grids is proposed. We provide both qualitative and quantitative analysis of our experiments with three separate GAN objectives, and compare the utility of two 3D reconstruction loss functions viz. Hausdorff Distance and Chamfer Distance. We hope that our work inspires further research in this direction. Clinical relevance— This paper establishes a new research direction to assist in automated implant design for cranioplasty.
A Generalized Multimodal Deep Learning Model for Early Crop Yield Prediction
(IEEE, 2022) Goyal, Navneet; Goyal, Poonam
Early crop yield prediction is crucial in agriculture for making administrative plans to ensure food security, post harvest management and distribution of a crop. Remote sensing data captured using various satellites provide reliable phenological information for a crop through surface reflectance bands. Other important factors, affecting crop yield include meteorological and soil. The data which we have used for crop yield prediction is multimodal. It consists of spatiotemporal meteorological (numeric) and surface reflectance bands (satellite image), and temporally static soil (satellite image) data. We effectively utilize this multimodal data to develop the proposed multimodal deep learning model, CropYieldNet. The objective of the paper is to accurately predict crop yield using high resolution data obtained from recently launched satellites such as Landsat8 and Sentinel-2. We used contrastive learning in a supervised setting and data augmentation techniques to overcome the limited historical data available for training deep learning models.We introduce a depth-level selection module for effectively modelling the depth-variant information of soil data. We have also modified our model to perform in-season (early) crop yield prediction which is as accurate as end-season prediction. We evaluate our model for two crops, corn and soybean, on counties in US and districts in India using data from MODIS, Landsat8, and Sentinel-2 satellites. Our extensive experimentation show that our model outperforms competing models. Our experiments also show that CropYieldNet generalizes well when applied on different crops and geographies.
AnyStreamKM: Anytime k-medoids Clustering for Streaming Data
(IEEE, 2022) Challa, Jagat Sesh; Goyal, Navneet; Goyal, Poonam
Stream Clustering algorithms have gained a lot of importance in the recent past due to rapid rising utilities of IoT systems and applications. Anytime algorithms and frameworks play a key role in handling streams that have data arriving/generating at variable rates. They are capable of handling both slow and fast stream speeds, at the same time generate the result with highest possible accuracy. In this paper, we present AnyStreamKM, which is a framework for anytime k-medoids clustering of data streams. It uses a proposed hierarchical data indexing structure known as AnyKMTree that stores the incoming data from the stream in the form of hierarchy of micro-clusters. AnyKMTree is an adaptation of R-tree with its splitting strategy inspired from the design principles of k-medoids clustering. AnyKMTree not only supports anytime features but is also capable of filtering out noise and outliers. Our experimental analysis establishes that AnyKMTree produces micro-clusters that are more compact and purer than the state-of-the-art methods. Also, when offline k-medoids clustering such as PAM (Partitioning Around Medoids) is applied on the micro-clusters produced by AnyKMTree, the resultant clustering has been found to be of higher quality than the state-of-the-art methods.
Multimodal Semantographic Metalanguage (MSM): A novel methodology for digital enablement of semi-literates
(ACM Digital Library, 2023-06) Goyal, Poonam; Goyal, Navneet
People in developing countries without tertiary education, face hurdles in using digital platforms for communication. The linguistic diversity of this section of population makes design of near-universal digital enablement methodology a challenging task. It is therefore pivotal to build a language agnostic methodology with bare minimum text to achieve digital communication across language boundaries. This would also help in bridging the "Digital Divide". In this paper, we illustrate building a Multimodal Semantographic Metalanguage (MSM) using Machine Learning (ML), Natural Language Processing (NLP) and Natural Semantic Metalanguage (NSM). The proposed methodology uses pictographs and ideographs, which are visually more distinctive, simpler to understand, have a reduced learning time and appropriate for achieving digital literacy for semi-literates. We establish our claim on a dataset compiled from text messages by semi-literates. We have observed that using the proposed approach, we can successfully communicate semantic elements across semi-literates with different linguistic backgrounds with an accuracy of more than 80%.
LSFuseNet: Dual-Fusion of Landsat-8 and Sentinel-2 Multispectral Time Series for Permutation Invariant Applications
(IEEE, 2023) Goyal, Navneet; Goyal, Poonam
Satellite data provides valuable insights into environmental changes and natural resource management, such as monitoring deforestation, mapping land use changes, and identifying areas at risk of soil degradation. Landsat-8 and Sentine1-2 are the publicly available high spatial resolution satellites launched in recent years. But, both have a moderate temporal resolution which limits their use in the applications like precision agriculture, land cover mapping, disaster monitoring, etc. For such applications, daily or weekly monitoring is better suited. Fusing data from the two satellites can provide enhanced observations. Both Landsat-8 and Sentine1-2 satellites have the same geographic coordinate systems which makes them amiable for fusion. But, fusing data at the pixel level for these satellites is challenging as they visit the same location on different days. The proposed model `LSFuseNet’ effectively fuses data at the feature level. It is a dual-fusion model in which bi-directional cross-modal attention is used to identify and exchange the hotspot information in the two modalities. A feature alignment module learns the fine-grained features and mitigates the noise in the data. We have innovatively applied contrastive learning to improve the quality of the learned representations of the data from the two satellites. We evaluate our model for two applications - crop yield prediction and snow cover prediction. For crop yield prediction, we have taken two crops, viz. corn, and soybean, for approximately 500 counties in the US. For snow cover prediction, we considered approximately 1300 US counties. Our extensive experiments show that LSFuseNet outperforms competing models. Also, the benefit of fusing the data from two satellites over using the data from a single satellite is evident from the results of both applications. We have further modified the model to include meteorological and/or soil data (if applicable) to further enhance the performance of the model.
Utilizing MODIS Fire Mask for Predicting Forest Fires Using Landsat-9/8 and Meteorological Data
(IEEE, 2023) Goyal, Navneet; Goyal, Poonam
Recent years have seen some of the largest forest fires ever, including the 2020 California megafires and the Australian bushfires, causing billions of dollars in property damage and destroying millions of acres of green reserves. The subject of forest fires becomes even more alarming when viewed in conjunction with the increasingly concerning problems of climate change and global warming. The planning regarding prevention and mitigation of forest fires and management of nearby areas can greatly benefit from an accurate prediction model. The objective of this study is to develop deep learning models which use satellite images and meteorological data to pinpoint potential fires at a pixel granularity. Data from the recently launched Landsat-8 and Landsat-9 satellite systems have been used to predict forest fires at a spatial resolution of 30m. The proposed solution uses the comprehensive geographical, meteorological, and MODIS-based fire history of the region, integrated from different data sources with pixel-level reprojection, as a multivariate time series (MVTS) to model the prediction problem as a binary classification problem. We adopt an encoder-classifier architecture: the BiLSTM-attention-based encoder is trained with supervised contrastive learning, while the fully-connected classifier is optimized against a weighted loss for increased recall. Our experiments demonstrate that the proposed model is robust to spatial and temporal variations in occurrence of fires, thereby making its deployment possible in any region of the world. With a mean AUC of 0.99, our proposed model outperforms the existing forest fire prediction models.

Department of Computer Science and Information Systems

Browse

Filters

Settings

Sort By

Results per page

Search Results