BITS Faculty Publications

Permanent URI for this communityhttp://localhost:4000/handle/123456789/1867

Browse

Search Results

Now showing 1 - 10 of 15

Vision-driven robotic grasping order generation using segmentation and relative positioning in a cluttered environment
(Elsevier, 2025) Sangwan, Kuldip Singh
In this paper, a multichannel vision-based approach for intelligent robotic grasping in cluttered environments is proposed. The experiments are conducted with an open-source synthetic dataset consisting of color and depth images to address this general problem. The proposed approach involves the use of a modified Cascade Mask R-CNN-based semantic segmentation model to detect and classify objects in the scene. The results show a high mAP@0.5-0.95 score of 93.85% for the customized Meta-Grasp dataset using this model. The captured depth data is processed based on the segmented mask regions to approximate their position in a 3D coordinate system. The affinity between the edge profiles is calculated to estimate the relation between the segmented objects in 3D space. This information is used to generate a priority order for object pickup such that only the objects in the top layer are picked first, followed by the underlying layers. The methodology was evaluated for various placement options for a 6-class subset of the dataset with a varying number of objects. The actual object classes and their mask positions were obtained successfully, and the priority order was calculated so that no lower-layered object was picked before the upper-lying object. Overall, the proposed two-stage decision pipeline has demonstrated its effectiveness in generating the pickup priority and sorting order for a multi-object scene and has potential applications in fully automated factories or smart manufacturing.
Detecting additive manufacturing anomalies with shallow convolutional neural networks
(Springer, 2025-10) Sangwan, Kuldip Singh
Additive manufacturing often known as 3D printing, has been significant in the manufacturing industry in recent decades. However, the method encounters significant challenges in the form of printing errors, adversely impacting end-user product experience and obstacles to widespread adoption. The current manual and sensor-based continuous monitoring techniques lack a clear distinction between anomalies and healthy data points, making them ineffective for implementation in industrial environments. This research introduces a computer vision-based methodology for detecting anomalies in real-time. Two Convolutional Neural Networks versions are created, Model V1 using residual connection with decreased parameters and computational complexity and Model V2 to facilitate effortless deployment on constraint devices without compromising performance. The proposed CNN networks are evaluated against state-of-the-art classification models, namely ResNet18, ResNet34, and Deep LSTM classifier, to assess their performance. Model V1 and Model V2 achieved comparable performances with 86.7% and 11.86% reduced parameters compared to ResNet18. Afterward, quantization is applied to produce a compact model representation for edge-device deployment. The quantization model proposed has no loss in performance. Lastly, an inference study is conducted on multiple edge devices where the TI AM68A board proved fast, with 0.246 and 0.04 s inference time for models V1 and V2 respectively.
Transformers for vision: a survey on innovative methods for computer vision
(IEEE, 2025-05) Kumar, Dhruv; Chalapathi, G.S.S.
Transformers have emerged as a groundbreaking architecture in the field of computer vision, offering a compelling alternative to traditional convolutional neural networks (CNNs) by enabling the modeling of long-range dependencies and global context through self-attention mechanisms. Originally developed for natural language processing, transformers have now been successfully adapted for a wide range of vision tasks, leading to significant improvements in performance and generalization. This survey provides a comprehensive overview of the fundamental principles of transformer architectures, highlighting the core mechanisms such as self-attention, multi-head attention, and positional encoding that distinguish them from CNNs. We delve into the theoretical adaptations required to apply transformers to visual data, including image tokenization and the integration of positional embeddings. A detailed analysis of key transformer-based vision architectures such as ViT, DeiT, Swin Transformer, PVT, Twins, and CrossViT are presented, alongside their practical applications in image classification, object detection, video understanding, medical imaging, and cross-modal tasks. The paper further compares the performance of vision transformers with CNNs, examining their respective strengths, limitations, and the emergence of hybrid models. Finally, current challenges in deploying ViTs, such as computational cost, data efficiency, and interpretability, and explore recent advancements and future research directions including efficient architectures, self-supervised learning, and multimodal integration are discussed.
Comparative study on deep neural network models for crop classification using time series polsar and optical data
(ISPRS, 2018-11) Phartiyal, Gopal Singh
Crop classification is an important task in many crop monitoring applications. Satellite remote sensing has provided easy, reliable, and fast approaches to crop classification task. In this study, a comparative analysis is made on the performances of various deep neural network (DNN) models for crop classification task using polarimetric synthetic aperture radar (PolSAR) and optical satellite data. For PolSAR data, Sentinel 1 dual pol SAR data is used. Sentinel 2 multispectral data is used as optical data. Five land cover classes including two crop classes of the season are taken. Time series data over the period of one crop cycle is used. Training and testing samples are measured and collected directly from the ground over the study region. Various convolutional neural network (CNN) and long short-term memory (LSTM) models are implemented, analysed, evaluated, and compared. Models are evaluated on the basis of classification accuracy and generalization performance.
A mixed spectral and spatial convolutional neural network for land cover classification using SAR and optical data
(EGU 2018, 2018) Phartiyal, Gopal Singh
Today, both SAR and optical data are available with good spatial and temporal resolutions. The two data modalities complement each other in many applications. There are numerous approaches to process the two data modalities, separately or combined. Domain or modality specific approaches such as polarimetric decomposition techniques or reflectance based techniques cannot work with the two datasets combined together. Data fusion approaches incur information loss during the process and are highly application specific. Machine learning (ML) approaches can operate on the combined dataset but have their own advantages and disadvantages. There is a need to explore new ML based approaches to achieve higher performance. Convolutional neural networks (CNNs) are young, trending, and promising ML tools in remote sensing applications. CNNs have the capability to learn complex features exclusively from data. Data from the two modalities can thus be brought together and processed with increased performance. In this paper an attempt is made to analyze CNN capabilities to perform land cover classification using multi-sensor data. SAR data used in this study is L band fully polarimetric PALSAR 2 data with 6 meter spatial resolution. Three basic polarimetric bands, namely, HH, HV, and VV, and four derived bands (polarization signatures) are used. Six multispectral Landsat 8 bands, pan sharpened and resampled at 6 meter spatial resolution, are used as optical data. All 13 features are stacked together and fed as input data to the proposed CNN. The areas selected for study are Haridwar and Roorkee regions of northern India. This study introduces a CNN where convolution is performed both spatially and spectrally. We show how this is an advantage over performing only spatial convolution. Five land cover classes namely, urban, bare soil, water, dense vegetation, and agriculture are considered. The CNN is trained on more than 1200 ground truth class data points measured directly on the terrain. The classification shows results with good generalization. Comparison with other classifiers such as SVMs shows that the proposed approach provides better classification results in terms of generalization, although the cross-validation accuracy is on the same order. The evaluation of the generalization of the classified image is done using ground truth knowledge on selected subset areas in the study area.
Permuted spectral and permuted spectral-spatial cnn models for polsar-multispectral data based land cover classification
(Taylor & Francis, 2020-12) Phartiyal, Gopal Singh
It is a challenge to develop methods which can process the polarimetric synthetic aperture radar (PolSAR) and multispectral (MS) data modalities together without losing information from either for remote sensing applications. This paper presents a study which attempts to introduce novel deep learning-based remote sensing data processing frameworks that utilize convolutional neural networks (CNNs) in both spatial and spectral domains to perform land cover (LC) classification with PolSAR-MS data. Also since earth observation remotely sensed data have usually larger spectral depth than normal camera image data, exploiting the spectral information in remote sensing (RS) data is crucial as well. In fact, convolutions in the sub-spectral space are intuitive and alternative to the process of feature selection. Recently, researchers have gained success in exploiting the spectral information of RS data, especially the hyperspectral data with CNNs. In this paper, exploitation of the spectral information in the PolSAR-MS data via a permuted localized spectral convolution along with localized spatial convolution is proposed. Further, the study in this paper also establishes the significance of performing permuted localized spectral convolutions over non-localized or localized spectral convolutions. Two models are proposed, namely a permuted local spectral convolutional network (Perm-LS-CNN) and a permuted local spectral-spatial convolutional network (Perm-LSS-CNN). These models are trained on ground truth class data points measured directly on the terrain. The evaluation of the generalization performance is done using ground truth knowledge on selected well-known regions in the study areas. Comparison with other popular machine learning classifiers shows that the Perm-LSS-CNN model provides better classification results in terms of both accuracy and generalization.
An attention-based deep network for plant disease classification
(2024) Bera, Asish
Plant disease classification using machine learning in a real agricultural field environment is a difficult task. Often, an automated plant disease diagnosis method might fail to capture and interpret discriminatory information due to small variations among leaf sub-categories. Yet, modern Convolutional Neural Networks (CNNs) have achieved decent success in discriminating various plant diseases using leave images. A few existing methods have applied additional pre-processing modules or sub-networks to tackle this challenge. Sometimes, the feature maps ignore partial information for holistic description by part-mining. A deep CNN that emphasizes integration of partial descriptiveness of leaf regions is proposed in this work. The efficacious attention mechanism is integrated with high-level feature map of a base CNN for enhancing feature representation. The proposed method focuses on important diseased areas in leaves, and employs an attention weighting scheme for utilizing useful neighborhood information. The proposed Attention-based network for Plant Disease Classification (APDC) method has achieved state-of-the-art performances on four public plant datasets containing visual/thermal images. The best top-1 accuracies attained by the proposed APDC are: PlantPathology 97.74%, PaddyCrop 99.62%, PaddyDoctor 99.65%, and PlantVillage 99.97%. These results justify the suitability of proposed method.
Poa-net: dance poses and activity classification using convolutional neural networks
(IEEE, 2024) Bera, Asish
Dance poses represent a complex human body-part movement, and express emotions and gesture. Dance pose classification is a challenging problem in computer vision. Convolutional Neural Networks (CNNs) have witnessed significant performance improvements in recognizing dance poses from images and videos. Most of the dance datasets in existing works are video-based and are not available publicly. This work contributes an image dataset representing 8 new dance styles blended with the Indian and international dance themes, called Dance-8. These unique 8 dance styles are combined with the Dance-12 public dataset for improving the posture diversity and dataset size. This extended dataset is called Dance-20. A custom CNN is developed for dance POses and Activity classification, named POA-Net. All three dance datasets have been evaluated using standard base CNNs and POA-Net. The POA-Net has attained an accuracy of 73.27% on Dance-8, 82.10% on Dance-12, and 73.10% on Dance-20. These performances are better than those of standard backbones, such as VGG16 and Inception-V3. The best accuracy of 81.57%, 85.08% and 76.73% has been achieved by MobileNet-v2 on these Dance-8, 12, and 20 datasets, respectively. Moreover, POA-Net has achieved the state-of-the-art accuracy of 99.74% on the DIAT, which is a radar-based human action image dataset
Application of Deep Neural Networks for Weed Detection and Classification
(IEEE, 2023-06) Bhatt, Upendra Mohan
Weeds compete for natural resources both in forest areas, harming the development of native vegetation, and in agricultural areas, affecting crop quality. The need then arises to classify these species, so that mechanical or chemical methods can be applied appropriately to contain the pests. This research presents the application and comparison of machine learning techniques, with the aim of automating the classification of images for agricultural challenges, such as the detection of defective seeds, and weeds and the category between these and native vegetation, while finally, the architecture of a convolutional neural network is presented. As a differential, the network's self-learning ability stands out, as images are captured in less than ideal conditions at varying heights and lighting levels in most cases. This research is expected to provide important information on artificial intelligence techniques that can be used in the classification of weed images, a factor that will contribute to the forestry and agricultural sector.
Deep Learning Based Super Resolution Network for Channel Estimation
(Taylor & Francis, 2024-12) Joshi, Sandeep
This paper proposes and investigates a deep learning-based channel estimation scheme for wireless communication system. In this approach, the channel response in pilot positions is considered a low-resolution image, which is further converted into a high-resolution image using the super-resolution (SR) network. It is observed that the proposed model shows an improvement of 50% and 42.5% as compared to the ChannelNet and super-resolution convolutional neural network, respectively, in the case of 16 pilots. The novelty of the proposed SR model is its low complexity, as it uses one model instead of two for channel estimation. Besides, the proposed SR model uses fewer pilots for channel estimation, making it bandwidth-efficient and fast. Furthermore, the proposed model is compared using extensive simulations for benchmarking.

BITS Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results