Department of Computer Science and Information Systems

Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1928

Browse

Search Results

Now showing 1 - 10 of 27

Women sport actions dataset for visual classification using small-scale training data
(Sage, 2025-07) Bera, Asish
Sports action classification representing complex body postures and player-object interactions, is an emerging area in image-based sports analysis. Some works have contributed to automated sports action recognition using machine learning techniques over the past decades. However, sufficient image datasets representing women’s sports actions with enough intra- and inter-class variations are not available to the researchers. To overcome this limitation, this work presents a new dataset named WomenSports for women’s sports classification using small-scale training data. This dataset includes a variety of sports activities, covering wide variations in movements, environments, and interactions among players. In addition, this study proposes a convolutional neural network (CNN) for deep feature extraction. A channel attention scheme upon local contextual regions is applied to refine and enhance feature representation. The experiments are carried out on three different sports datasets and one dance dataset for generalizing the proposed algorithm, and the performances on these datasets are noteworthy. The deep learning method achieves 89.15% top-1 classification accuracy using ResNet-50 on the proposed WomenSports dataset, which is publicly available for research at Mendeley Data.
An attention-based deep network for plant disease classification
(2024) Bera, Asish
Plant disease classification using machine learning in a real agricultural field environment is a difficult task. Often, an automated plant disease diagnosis method might fail to capture and interpret discriminatory information due to small variations among leaf sub-categories. Yet, modern Convolutional Neural Networks (CNNs) have achieved decent success in discriminating various plant diseases using leave images. A few existing methods have applied additional pre-processing modules or sub-networks to tackle this challenge. Sometimes, the feature maps ignore partial information for holistic description by part-mining. A deep CNN that emphasizes integration of partial descriptiveness of leaf regions is proposed in this work. The efficacious attention mechanism is integrated with high-level feature map of a base CNN for enhancing feature representation. The proposed method focuses on important diseased areas in leaves, and employs an attention weighting scheme for utilizing useful neighborhood information. The proposed Attention-based network for Plant Disease Classification (APDC) method has achieved state-of-the-art performances on four public plant datasets containing visual/thermal images. The best top-1 accuracies attained by the proposed APDC are: PlantPathology 97.74%, PaddyCrop 99.62%, PaddyDoctor 99.65%, and PlantVillage 99.97%. These results justify the suitability of proposed method.
Poa-net: dance poses and activity classification using convolutional neural networks
(IEEE, 2024) Bera, Asish
Dance poses represent a complex human body-part movement, and express emotions and gesture. Dance pose classification is a challenging problem in computer vision. Convolutional Neural Networks (CNNs) have witnessed significant performance improvements in recognizing dance poses from images and videos. Most of the dance datasets in existing works are video-based and are not available publicly. This work contributes an image dataset representing 8 new dance styles blended with the Indian and international dance themes, called Dance-8. These unique 8 dance styles are combined with the Dance-12 public dataset for improving the posture diversity and dataset size. This extended dataset is called Dance-20. A custom CNN is developed for dance POses and Activity classification, named POA-Net. All three dance datasets have been evaluated using standard base CNNs and POA-Net. The POA-Net has attained an accuracy of 73.27% on Dance-8, 82.10% on Dance-12, and 73.10% on Dance-20. These performances are better than those of standard backbones, such as VGG16 and Inception-V3. The best accuracy of 81.57%, 85.08% and 76.73% has been achieved by MobileNet-v2 on these Dance-8, 12, and 20 datasets, respectively. Moreover, POA-Net has achieved the state-of-the-art accuracy of 99.74% on the DIAT, which is a radar-based human action image dataset
RAFA-net: region attention network for food items and agricultural stress recognition
(IEEE, 2024-10) Bera, Asish
Deep convolutional neural networks (CNNs) have facilitated remarkable success in recognizing various food items and agricultural stress. A decent performance boost has been witnessed in solving the agro-food challenges by mining and analyzing region-based partial feature descriptors. Also, computationally expensive ensemble learning schemes fusing multiple CNNs have been studied in earlier works. This work proposes a region attention scheme for modeling long-range dependencies by building a correlation among different regions within an input image. The attention method enhances feature representation by learning the usefulness of context information from complementary regions. Spatial pyramidal pooling and average pooling pairs aggregate partial descriptors into a holistic representation. Both pooling methods establish spatial and channelwise relationships without incurring extra parameters. A context gating scheme is applied to refine the descriptiveness of weighted attentional features, which is relevant for classification. The proposed region attention network for food items and agricultural stress recognition method, dubbed RAFA-Net, has been experimented on three public food datasets, and has achieved state-of-the-art performances with distinct margins. The highest top-1 accuracy of RAFA-Net is 91.69%, 91.56%, and 96.97% on the UECFood-100, UECFood-256, and MAFood-121 datasets, respectively. In addition, better accuracies have been achieved on two benchmark agricultural stress datasets. The best top-1 accuracies on the Insect Pest (IP-102) and PlantDoc-27 plant disease datasets are 92.36%, and 85.54%, respectively; implying RAFA-Net's generalization capability.
Fluorescence microscopy and histopathology image based cancer classification using graph convolutional network with channel splitting
(Elsevier, 2025-05) Bera, Asish
Since the proliferation of deep learning, several convolutional neural networks (CNNs) are developed to attain significant breakthroughs for automated cancer classification using histopathology and fluorescence microscopy images. This work enhances the classification performances of human breast and lung-colon cancers further by exploring a two-layer graph convolutional network (GCN) upon a proposed lightweight deep convolutional backbone or existing pre-trained CNN. The first graph convolution layer considers local regions as the graph nodes with channel information as node features. The second layer is rendered by pooling and splitting the output feature map of former layer into a low dimensional feature vector that serves as node features. The proposed method, named Channel-Splitting Graph Convolutional Network (CS-GCN), enhances holistic feature representation of spatial structural information. The significance of region-aware distinctness is explored for building a correlation among neighboring regions through node-level mixed feature propagation of a graph. The experiments are carried out on three public datasets, representing the breast cancer (actin-labeled fluorescence microscopy image dataset (FMID), and BreakHis dataset with four magnifications), and lung-colon cancer (LC25000 dataset). The top-1 classification accuracies attained by CS-GCN using ResNet-50 backbone on the FMID: 99.30%, BreakHis 40x: 98.0%, BreakHis 100x: 97.81%, BreakHis 200x: 97.33%, BreakHis 400x: 96.85%, and LC25000: 100.0%. The performances are improved on these datasets, while built upon a proposed convolutional stem as well as pre-trained ResNet-50 and DenseNet-201 backbones, implying the effectiveness of the proposed CS-GCN.
Deep Neural Networks Fused with Textures for Image Classification
(Springer, 2023-08) Bera, Asish
Fine-grained image classification (FGIC) is a challenging task due to small visual differences among inter-subcategories, but large intra-class variations. In this paper, we propose a fusion approach to address FGIC by combining global texture with local patch-based information. The first pipeline extracts deep features from various fixed-size non-overlapping patches and encodes features by sequential modeling using the long short-term memory (LSTM). Another path computes image-level textures at multiple scales using the local binary patterns (LBP). The advantages of both streams are integrated to represent an efficient feature vector for classification. The method is tested on six datasets (e.g., human faces, food-dishes, etc.) using four backbone CNNs. Our method has attained better classification accuracy over existing methods with notable margins
WSports-50: An Image Dataset for Women’s Sport Action Classification
(Springer, 2024-07) Bera, Asish
Sport action recognition is an interesting area in computer vision. Categorization of sport actions, representing difficult and complex body postures, is regarded as a fine-grained visual classification problem. The Convolutional Neural Networks (CNNs) have attained enhanced performance over conventional feature descriptors in recognizing various sport activities. In general, though decent improvement has been gained using deep learning for sport action recognition, however, recognition of women’s sport activities is not widely explored. Even, no benchmark dataset depicting women’s sport action with sufficient variations is available yet for study. Hence, fine-grained image classification of diverse sport categories involving female/women athletics requires immediate research attention. To overcome this limitation, this paper proposes an image dataset comprising worldwide popular 50 sport categories of women players only. A simple deep learning model is proposed that extracts the high-level deep features using a backbone CNN. Then, these features are pooled from a collection of regular regions representing local discriminative information. The spatial pyramid pooling is applied for mining semantic information and enhancing feature aggregation for classification. The proposed method has achieved satisfactory performance on the Women Sports dataset using four standard backbone CNNs. Moreover, our method has achieved better accuracy on the Yoga-82 pose recognition dataset with a significant margin, e.g., 11.6% gain using ResNet-50 base CNN.
Deep Ear Biometrics for Gender Classification
(Springer, 2023-07) Bera, Asish
Human gender classification based on biometric features is a major concern for computer vision due to its vast variety of applications. The human ear is popular among researchers as a soft biometric trait, because it is less affected by age or changing circumstances and is non-intrusive. In this study, we have developed a deep convolutional neural network (CNN) model for automatic gender classification using the samples of ear images. The performance is evaluated using four cutting-edge pre-trained CNN models. In terms of trainable parameters, the proposed technique requires significantly less computational complexity. The proposed model has achieved 93% accuracy on the EarVN1.0 ear dataset.
Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis
(IEEE, 2023-07) Bera, Asish
Human body-pose estimation is a complex problem in computer vision. Recent research interests have been widened specifically on the sports, yoga, and dance (SYD) postures for maintaining health conditions. The SYD pose categories are regarded as a fine-grained image classification (FGIC) task due to the complex movement of body parts. Deep convolutional neural networks (CNNs) have attained significantly improved performance in solving various human body-pose estimation problems. Though decent progress has been achieved in yoga postures recognition using deep-learning techniques, fine-grained sports and dance recognition necessitates ample research attention. However, no benchmark public image dataset with sufficient interclass and intraclass variations is available yet to address sports and dance postures classification. To solve this limitation, we have proposed two image datasets, one for 102 sport categories and another for 12 dance styles. Two public datasets, Yoga-82 that contains 82 classes and Yoga-107 that represents 107 classes, are collected for yoga postures. These four SYD datasets are experimented with the proposed deep model, SYD-Net, which integrates a patch-based attention (PbA) mechanism on top of standard backbone CNNs. The PbA module leverages the self-attention mechanism that learns contextual information from a set of uniform and multiscale patches and emphasizes discriminative features to understand the semantic correlation among patches. Moreover, random erasing data augmentation is applied to improve performance. The proposed SYD-Net has achieved state-of-the-art accuracy on Yoga-82 using five base CNNs. SYD-Net’s accuracy on other datasets is remarkable, implying its efficiency. Our Sports-102 and Dance-12 datasets are publicly available at https://sites.google.com/view/syd-net/home
Style matching CAPTCHA: match neural transferred styles to thwart intelligent attacks
(Springer, 2023-03) Bera, Asish
Completely automated public turing test to tell computers and humans apart (CAPTCHA) is widely used to prevent malicious automated attacks on various online services. Text- and image-CAPTCHAs have shown broader acceptability due to usability and security factors. However, recent progress in deep learning implies that text-CAPTCHAs can easily be exposed to various fraudulent attacks. Thus, image-CAPTCHAs are getting research attention to enhance usability and security. In this work, the neural-style transfer (NST) is adapted for designing an image-CAPTCHA algorithm to enhance security while maintaining human performance. In NST-rendered image-CAPTCHAs, existing methods inquire a user to identify or localize the salient object (e.g., content) which is solvable effortlessly by off-the-shelf intelligent tools. Contrarily, we propose a Style Matching CAPTCHA (SMC) that asks a user to select the style image which is applied in the NST method. A user can solve a random SMC challenge by understanding the semantic correlation between the content and style output as a cue. The performance in solving SMC is evaluated based on the 1368 responses collected from 152 participants through a web-application. The average solving accuracy in three sessions is 95.61%; and the average response time for each challenge per user is 6.52 s, respectively. Likewise, a Smartphone Application (SMC-App) is devised using the proposed method. The average solving accuracy through SMC-App is 96.33%, and the average solving time is 5.13 s. To evaluate the vulnerability of SMC, deep learning-based attack schemes using Convolutional Neural Networks (CNN), such as ResNet-50 and Inception-v3 are simulated. The average accuracy of attacks considering various studies on SMC using ResNet-50 and Inception-v3 is 37%, which is improved over existing methods. Moreover, in-depth security analysis, experimental insights, and comparative studies imply the suitability of the proposed SMC.

Department of Computer Science and Information Systems

Browse

Filters

Settings

Sort By

Results per page

Search Results