BITS Faculty Publications

Permanent URI for this communityhttp://localhost:4000/handle/123456789/1867

Browse

Search Results

Now showing 1 - 4 of 4
  • Item
    An attention-based deep network for plant disease classification
    (2024) Bera, Asish
    Plant disease classification using machine learning in a real agricultural field environment is a difficult task. Often, an automated plant disease diagnosis method might fail to capture and interpret discriminatory information due to small variations among leaf sub-categories. Yet, modern Convolutional Neural Networks (CNNs) have achieved decent success in discriminating various plant diseases using leave images. A few existing methods have applied additional pre-processing modules or sub-networks to tackle this challenge. Sometimes, the feature maps ignore partial information for holistic description by part-mining. A deep CNN that emphasizes integration of partial descriptiveness of leaf regions is proposed in this work. The efficacious attention mechanism is integrated with high-level feature map of a base CNN for enhancing feature representation. The proposed method focuses on important diseased areas in leaves, and employs an attention weighting scheme for utilizing useful neighborhood information. The proposed Attention-based network for Plant Disease Classification (APDC) method has achieved state-of-the-art performances on four public plant datasets containing visual/thermal images. The best top-1 accuracies attained by the proposed APDC are: PlantPathology 97.74%, PaddyCrop 99.62%, PaddyDoctor 99.65%, and PlantVillage 99.97%. These results justify the suitability of proposed method.
  • Item
    Poa-net: dance poses and activity classification using convolutional neural networks
    (IEEE, 2024) Bera, Asish
    Dance poses represent a complex human body-part movement, and express emotions and gesture. Dance pose classification is a challenging problem in computer vision. Convolutional Neural Networks (CNNs) have witnessed significant performance improvements in recognizing dance poses from images and videos. Most of the dance datasets in existing works are video-based and are not available publicly. This work contributes an image dataset representing 8 new dance styles blended with the Indian and international dance themes, called Dance-8. These unique 8 dance styles are combined with the Dance-12 public dataset for improving the posture diversity and dataset size. This extended dataset is called Dance-20. A custom CNN is developed for dance POses and Activity classification, named POA-Net. All three dance datasets have been evaluated using standard base CNNs and POA-Net. The POA-Net has attained an accuracy of 73.27% on Dance-8, 82.10% on Dance-12, and 73.10% on Dance-20. These performances are better than those of standard backbones, such as VGG16 and Inception-V3. The best accuracy of 81.57%, 85.08% and 76.73% has been achieved by MobileNet-v2 on these Dance-8, 12, and 20 datasets, respectively. Moreover, POA-Net has achieved the state-of-the-art accuracy of 99.74% on the DIAT, which is a radar-based human action image dataset
  • Item
    A Graph Convolutional Network for Visual Categorization
    (Springer, 2024-10) Bera, Asish; Hazra, Arnab
    The Convolutional Neural Networks (CNNs) have attained enhanced performance over conventional feature descriptors for image classification. Recently, Graph Convolutional Networks (GCNs) have also been witnessed in achieving improved performances for visual classification in various domains. A typical GCN is pertinent for propagating deep features using graph-based message passing methods. There are several domains such as the disease diagnosis of humans and plants where GCN could be explored for further performance enhancement. Thus, ample research attention is essential for solving different kinds of visual classification problems. In this direction, this work integrates the benefits of CNN and GCN for improving the feature representation by building a spatial relation using a GCN. In this work, a simple deep learning model is proposed that extracts the high-level deep features using a backbone CNN. Then, a GCN is applied for enhancing feature representation capabilities further for image classification. The proposed method has achieved improved performances on seven benchmark public datasets representing dance postures, hand shapes, agriculture, medical imaging, and aerial scene classification. The proposed method is developed using four different CNN backbones. Particularly, the proposed method based on ResNet-50 backbone has attained 89.98% accuracy on Dance-12, 90.34% accuracy on REST hand shape, 94.06% accuracy on Kvasir, and 75.89% accuracy on ISIC skin cancer, 91.73% accuracy on AID aerial scene classification, and 95.24% accuracy on PlantPathology datasets.
  • Item
    Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis
    (IEEE, 2023-07) Bera, Asish
    Human body-pose estimation is a complex problem in computer vision. Recent research interests have been widened specifically on the sports, yoga, and dance (SYD) postures for maintaining health conditions. The SYD pose categories are regarded as a fine-grained image classification (FGIC) task due to the complex movement of body parts. Deep convolutional neural networks (CNNs) have attained significantly improved performance in solving various human body-pose estimation problems. Though decent progress has been achieved in yoga postures recognition using deep-learning techniques, fine-grained sports and dance recognition necessitates ample research attention. However, no benchmark public image dataset with sufficient interclass and intraclass variations is available yet to address sports and dance postures classification. To solve this limitation, we have proposed two image datasets, one for 102 sport categories and another for 12 dance styles. Two public datasets, Yoga-82 that contains 82 classes and Yoga-107 that represents 107 classes, are collected for yoga postures. These four SYD datasets are experimented with the proposed deep model, SYD-Net, which integrates a patch-based attention (PbA) mechanism on top of standard backbone CNNs. The PbA module leverages the self-attention mechanism that learns contextual information from a set of uniform and multiscale patches and emphasizes discriminative features to understand the semantic correlation among patches. Moreover, random erasing data augmentation is applied to improve performance. The proposed SYD-Net has achieved state-of-the-art accuracy on Yoga-82 using five base CNNs. SYD-Net’s accuracy on other datasets is remarkable, implying its efficiency. Our Sports-102 and Dance-12 datasets are publicly available at https://sites.google.com/view/syd-net/home