Browsing by Author "Bera, Asish"

Now showing 1 - 20 of 28

An attention-based deep network for plant disease classification
(2024) Bera, Asish
Plant disease classification using machine learning in a real agricultural field environment is a difficult task. Often, an automated plant disease diagnosis method might fail to capture and interpret discriminatory information due to small variations among leaf sub-categories. Yet, modern Convolutional Neural Networks (CNNs) have achieved decent success in discriminating various plant diseases using leave images. A few existing methods have applied additional pre-processing modules or sub-networks to tackle this challenge. Sometimes, the feature maps ignore partial information for holistic description by part-mining. A deep CNN that emphasizes integration of partial descriptiveness of leaf regions is proposed in this work. The efficacious attention mechanism is integrated with high-level feature map of a base CNN for enhancing feature representation. The proposed method focuses on important diseased areas in leaves, and employs an attention weighting scheme for utilizing useful neighborhood information. The proposed Attention-based network for Plant Disease Classification (APDC) method has achieved state-of-the-art performances on four public plant datasets containing visual/thermal images. The best top-1 accuracies attained by the proposed APDC are: PlantPathology 97.74%, PaddyCrop 99.62%, PaddyDoctor 99.65%, and PlantVillage 99.97%. These results justify the suitability of proposed method.
An attention-driven hierarchical multi-scale representation for visual recognition
(ARXIV, 2021) Bera, Asish
Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content. This is mainly due to their ability to break down an image into smaller pieces, extract multi-scale localized features and compose them to construct highly expressive representations for decision making. However, the convolution operation is unable to capture long-range dependencies such as arbitrary relations between pixels since it operates on a fixed-size window. Therefore, it may not be suitable for discriminating subtle changes (e.g. fine-grained visual recognition). To this end, our proposed method captures the high-level long-range dependencies by exploring Graph Convolutional Networks (GCNs), which aggregate information by establishing relationships among multi-scale hierarchical regions. These regions consist of smaller (closer look) to larger (far look), and the dependency between regions is modeled by an innovative attention-driven message propagation, guided by the graph structure to emphasize the neighborhoods of a given region. Our approach is simple yet extremely effective in solving both the fine-grained and generic visual classification problems. It outperforms the state-of-the-arts with a significant margin on three and is very competitive on other two datasets.
Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification
(Association for the Advancement of Artificial Intelligence, 2021) Bera, Asish
Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene plays a key role since it exhibits a significant variance in the same subcategory and subtle variance among different subcategories. Finding the subtle variance that fully characterizes the object/scene is not straightforward. To address this, we propose a novel context-aware attentional pooling (CAP) that effectively captures subtle changes via sub-pixel gradients, and learns to attend informative integral regions and their importance in discriminating different subcategories without requiring the bounding-box and/or distinguishable part annotations. We also introduce a novel feature encoding by considering the intrinsic consistency between the informativeness of the integral regions and their spatial structures to capture the semantic correlation among them. Our approach is simple yet extremely effective and can be easily applied on top of a standard classification backbone network. We evaluate our approach using six state-of-the-art (SotA) backbone networks and eight benchmark datasets. Our method significantly outperforms the SotA approaches on six datasets and is very competitive with the remaining two
Deep Ear Biometrics for Gender Classification
(Springer, 2023-07) Bera, Asish
Human gender classification based on biometric features is a major concern for computer vision due to its vast variety of applications. The human ear is popular among researchers as a soft biometric trait, because it is less affected by age or changing circumstances and is non-intrusive. In this study, we have developed a deep convolutional neural network (CNN) model for automatic gender classification using the samples of ear images. The performance is evaluated using four cutting-edge pre-trained CNN models. In terms of trainable parameters, the proposed technique requires significantly less computational complexity. The proposed model has achieved 93% accuracy on the EarVN1.0 ear dataset.
Deep Neural Networks Fused with Textures for Image Classification
(Springer, 2023-08) Bera, Asish
Fine-grained image classification (FGIC) is a challenging task due to small visual differences among inter-subcategories, but large intra-class variations. In this paper, we propose a fusion approach to address FGIC by combining global texture with local patch-based information. The first pipeline extracts deep features from various fixed-size non-overlapping patches and encodes features by sequential modeling using the long short-term memory (LSTM). Another path computes image-level textures at multiple scales using the local binary patterns (LBP). The advantages of both streams are integrated to represent an efficient feature vector for classification. The method is tested on six datasets (e.g., human faces, food-dishes, etc.) using four backbone CNNs. Our method has attained better classification accuracy over existing methods with notable margins
Driver Distraction Recognition-driven Collision Avoidance Algorithm for Active Vehicle Safety
(IEEE, 2021) Bera, Asish
This paper integrates human driver factors with a model-based Collision Avoidance System (CAS) to enhance the safety of semi-autonomous vehicles. Driver Activity Recognition (DAR) through Driver Distraction States (DDS) has been used as the key component to trigger the CAS so that collisions can be averted. DDS has been generated using realistic normal driving scenarios and suitably integrated with a Full State Feedback (FSF) controller-based CAS. The integrated algorithm has been tested using a Hardware in Loop (HiL) setup, which is interfaced with the vehicle dynamics software IPG TruckMaker ® . The performance of the algorithm has been evaluated for various on-road scenarios and found to be effective in avoiding rear-end collisions.
Error Detecting Dual Basis Bit Parallel Systolic Multiplication Architecture over GF(2m)
(IEEE, 2009) Bera, Asish
This paper presents an error tolerant hardware efficient VLSI architecture for bit parallel systolic multiplication over dual base, which can be pipelined. This error tolerant architecture is well suited to VLSI implementation because of its regularity, modular structure, and unidirectional data flow. The length of the largest delay path and area of this architecture are less compared to the bit parallel systolic multiplication architectures reported earlier. The architecture is implemented using Austria Micro System's 0.35 mum CMOS technology. This architecture can also operate over both the dual-base and polynomial base.
Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis
(IEEE, 2023-07) Bera, Asish
Human body-pose estimation is a complex problem in computer vision. Recent research interests have been widened specifically on the sports, yoga, and dance (SYD) postures for maintaining health conditions. The SYD pose categories are regarded as a fine-grained image classification (FGIC) task due to the complex movement of body parts. Deep convolutional neural networks (CNNs) have attained significantly improved performance in solving various human body-pose estimation problems. Though decent progress has been achieved in yoga postures recognition using deep-learning techniques, fine-grained sports and dance recognition necessitates ample research attention. However, no benchmark public image dataset with sufficient interclass and intraclass variations is available yet to address sports and dance postures classification. To solve this limitation, we have proposed two image datasets, one for 102 sport categories and another for 12 dance styles. Two public datasets, Yoga-82 that contains 82 classes and Yoga-107 that represents 107 classes, are collected for yoga postures. These four SYD datasets are experimented with the proposed deep model, SYD-Net, which integrates a patch-based attention (PbA) mechanism on top of standard backbone CNNs. The PbA module leverages the self-attention mechanism that learns contextual information from a set of uniform and multiscale patches and emphasizes discriminative features to understand the semantic correlation among patches. Moreover, random erasing data augmentation is applied to improve performance. The proposed SYD-Net has achieved state-of-the-art accuracy on Yoga-82 using five base CNNs. SYD-Net’s accuracy on other datasets is remarkable, implying its efficiency. Our Sports-102 and Dance-12 datasets are publicly available at https://sites.google.com/view/syd-net/home
Finger Biometric Recognition with Feature Selection
(CRC, 2019) Bera, Asish
Biometrics is indispensable in this modern digital era for the secure automated human authentication in various fields of machine learning and pattern recognition. Hand geometry is a promising physiological biometric trait with ample deployed application areas for identity verification. Due to the intricate anatomic foundation of the thumb and substantial interfinger posture variation, satisfactory performances cannot be achieved while the thumb is included in the contact-free environment. To overcome the hindrance associated with the thumb, four-finger-based (excluding the thumb) biometric approaches have been devised. In this chapter, a four-finger-based biometric method has been presented. Again, the selection of salient features is essential to reduce the feature dimensionality by eliminating insignificant features. Weights are assigned according to the discriminative efficiency of the features to emphasize the essential features. Two different strategies, namely, the global and local feature selection methods are adopted based on the adaptive forward-selection and backward-elimination (FoBa) algorithm. The identification performance is evaluated using the weighted k-nearest neighbor and random forest classifiers. The experiments are conducted using the selected feature subsets over the 300 subjects of the Bosphorus hand database. The best identification accuracy of 98.67% and equal error rate of 4.6% have been achieved by using the subset of 25 features those are selected by the rank-based local FoBa algorithm.
Finger contour profile based hand biometric recognition
(Springer, 2016-10) Bera, Asish
This paper presents a contactless hand biometric system at unrestricted hand pose environment. A new preprocessing technique is proposed for defining the finger contour profiles (FCP). It mainly consists of simple grayscale image transformation, subtraction, and logical XOR operation. This hand prototyping method logically decomposes global hand contour into the left and right contour profiles of each finger. A set of twenty pose-invariant geometric features is extracted from the FCP and normalized global hand shape. Experiments are conducted on two publicly available hand databases namely, the Bosphorus and IIT Delhi (IITD) databases to validate the system using the kNN, minimum distance, and random forest (RF) classifiers. Satisfactory identification accuracy of 97.82 % using the RF classifier has been achieved for the Bosphorus database with 320 subjects; and in verification, 3.28 % equal error rate (EER) is reported. The kNN classifier has been found to produce good identification success of 95.22 % for the IITD database of 230 subjects; and 4.76 % EER is obtained in verification. The average execution time of this approach is lesser than 2 s, that implies its suitability in real-world applications.
Fluorescence microscopy and histopathology image based cancer classification using graph convolutional network with channel splitting
(Elsevier, 2025-05) Bera, Asish
Since the proliferation of deep learning, several convolutional neural networks (CNNs) are developed to attain significant breakthroughs for automated cancer classification using histopathology and fluorescence microscopy images. This work enhances the classification performances of human breast and lung-colon cancers further by exploring a two-layer graph convolutional network (GCN) upon a proposed lightweight deep convolutional backbone or existing pre-trained CNN. The first graph convolution layer considers local regions as the graph nodes with channel information as node features. The second layer is rendered by pooling and splitting the output feature map of former layer into a low dimensional feature vector that serves as node features. The proposed method, named Channel-Splitting Graph Convolutional Network (CS-GCN), enhances holistic feature representation of spatial structural information. The significance of region-aware distinctness is explored for building a correlation among neighboring regions through node-level mixed feature propagation of a graph. The experiments are carried out on three public datasets, representing the breast cancer (actin-labeled fluorescence microscopy image dataset (FMID), and BreakHis dataset with four magnifications), and lung-colon cancer (LC25000 dataset). The top-1 classification accuracies attained by CS-GCN using ResNet-50 backbone on the FMID: 99.30%, BreakHis 40x: 98.0%, BreakHis 100x: 97.81%, BreakHis 200x: 97.33%, BreakHis 400x: 96.85%, and LC25000: 100.0%. The performances are improved on these datasets, while built upon a proposed convolutional stem as well as pre-trained ResNet-50 and DenseNet-201 backbones, implying the effectiveness of the proposed CS-GCN.
Fusion-Based Hand Geometry Recognition Using Dempster–Shafer Theory
(World Scientific, 2015) Bera, Asish
This paper presents a new technique for user identification and recognition based on the fusion of hand geometric features of both hands without any pose restrictions. All the features are extracted from normalized left and right hand images. Fusion is applied at feature and also at decision level. Two probability-based algorithms are proposed for classification. The first algorithm computes the maximum probability for nearest three neighbors. The second algorithm determines the maximum probability of the number of matched features with respect to a thresholding on distances. Based on these two highest probabilities initial decisions are made. The final decision is considered according to the highest probability as calculated by the Dempster–Shafer theory of evidence. Depending on the various combinations of the initial decisions, three schemes are experimented with 201 subjects for identification and verification. The correct identification rate is found to be 99.5%, and the false acceptance rate (FAR) of 0.625% has been found during verification.
A Graph Convolutional Network for Visual Categorization
(Springer, 2024-10) Bera, Asish; Hazra, Arnab
The Convolutional Neural Networks (CNNs) have attained enhanced performance over conventional feature descriptors for image classification. Recently, Graph Convolutional Networks (GCNs) have also been witnessed in achieving improved performances for visual classification in various domains. A typical GCN is pertinent for propagating deep features using graph-based message passing methods. There are several domains such as the disease diagnosis of humans and plants where GCN could be explored for further performance enhancement. Thus, ample research attention is essential for solving different kinds of visual classification problems. In this direction, this work integrates the benefits of CNN and GCN for improving the feature representation by building a spatial relation using a GCN. In this work, a simple deep learning model is proposed that extracts the high-level deep features using a backbone CNN. Then, a GCN is applied for enhancing feature representation capabilities further for image classification. The proposed method has achieved improved performances on seven benchmark public datasets representing dance postures, hand shapes, agriculture, medical imaging, and aerial scene classification. The proposed method is developed using four different CNN backbones. Particularly, the proposed method based on ResNet-50 backbone has attained 89.98% accuracy on Dance-12, 90.34% accuracy on REST hand shape, 94.06% accuracy on Kvasir, and 75.89% accuracy on ISIC skin cancer, 91.73% accuracy on AID aerial scene classification, and 95.24% accuracy on PlantPathology datasets.
Hand Biometric Verification with Hand Image-Based CAPTCHA
(Springer, 2018-05) Bera, Asish
An approach for hand biometric recognition with the hand image-based CAPTCHA verification is presented in this paper. A new method for CAPTCHA generation is implemented based on the genuine and fake hand images which are embedded in a complex textured color background image. The HandCaptcha is a useful application to differentiate between the human and automated scripts. The first level of security is achieved by the HandCaptcha against the malicious threats and attacks. After solving the HandCaptcha correctly, the identity of a person is authenticated based on the contact-less hand geometric verification approach in the second level. A set of 300 unique HandCaptcha is created randomly and solved by at least 100 persons with the accuracy of 98.34%. Next, the left-hand images of the legitimate users are normalized, and sixteen geometric features are computed from every normalized hand. Experiments are conducted on the 200 subjects of the Bosporus left-hand database. Classification accuracy of 99.5% has been achieved using the kNN classifier, and the equal error rate is 3.93%.
Hand Biometrics in Digital Forensics
(Springer, 2014) Bera, Asish
Digital forensic is now an unavoidable part for securing the digital world from identity theft. Higher order of crimes, dealing with a massive database is really very challenging problem for any intelligent system. Biometric is a better solution to win over the problems encountered by digital forensics. Many biometric characteristics are playing their significant roles in forensics over the decades. The potential benefits and scope of hand based modes in forensics have been investigated with an illustration of hand geometry verification method. It can be applied when effective biometric evidences are properly unavailable; gloves are damaged, and dirt or any kind of liquid can minimize the accessibility and reliability of the fingerprint or palmprint. Due to the crisis of pure uniqueness of hand features for a very large database, it may be relevant for verification only. Some unimodal and multimodal hand based biometrics (e.g. hand geometry, palmprint and hand vein) with several feature extraction, database and verification methods have been discussed with 2D, 3D and infrared images.
Human Gender Classification Based on Hand Images Using Deep Learning
(Springer, 2023-01) Bera, Asish
Soft biometric traits (e.g., gender, age, etc. can characterize very relevant personal information. The hand-based traits are studied for traditional/hard biometric recognition for diverse applications. However, little attention is focused to tackle soft biometrics using hand images. In this paper, human gender classification is addressed using the frontal and dorsal hand images of a human. A new hand dataset is created at the Jadavpur University, India denoted as JU-HD for experiments. It represents significant posture variations in an uncontrolled laboratory environment. Sample hand images of 57 persons are collected to incorporate more user-flexibility in posing the hands that incur additional challenges to discriminate the person’s gender. Five backbone CNNs are used to develop a deep model for gender classification. The method achieves 90.49% accuracy on JU-HD using Inception-v3.
Human Identification Using Selected Features From Finger Geometric Profiles
(IEEE, 2020-03) Bera, Asish
A finger biometric system at an unconstrained environment is presented in this paper. A technique for hand image normalization is implemented at the preprocessing stage that decomposes the main hand contour into finger-level shape representation. This normalization technique follows subtraction of transformed binary image from binary hand contour image to generate the left-side of finger profiles (LSFPs). Then, XOR is applied to LSFP image and hand contour image to produce the right side of finger profiles. During feature extraction, initially, 30 geometric features are computed from every normalized finger. The rank-based forward-backward greedy algorithm is followed to select relevant features and to enhance classification accuracy. Two different subsets of features containing 9 and 12 discriminative features per finger are selected for two separate experimentations those use the k-nearest neighbor and the random forest (RF) for classification on the Bosphorus hand database. The experiments with the selected features of four fingers except the thumb have obtained improved performances compared to features extracted from five fingers and also other existing methods evaluated on the Bosphorus database. The best identification accuracies of 96.56% and 95.92% using the RF classifier have been achieved for the rightand left-hand images of 638 subjects, respectively. An equal error rate of 0.078 is obtained for both types of the hand images.
Person recognition using alternative hand geometry
(Inder Science, 2014-08) Bera, Asish
In this paper, a new approach for user recognition is presented, which is based on the geometric features from either left or right hand images. The hand images are collected at unconstrained pose environment. Image normalisation is applied at the preprocessing stage. Features are extracted from the normalised images, which are mainly comprised of lengths and widths at different positions of the fingers. A simple classification algorithm has been implemented that is primarily dependent on the ratio of modified minimum distance and number of features, which are matched within a distance threshold. Experimental results of identification and verification are quite acceptable, producing 98.8% identification and 99.6% verification (at 0.55% FAR) of 253 standard subjects which are a blend of both left and right hand images.
PND-Net: plant nutrition deficiency and disease classification using graph convolutional network
(Springer Nature, 2024-07) Bera, Asish
Crop yield production could be enhanced for agricultural growth if various plant nutrition deficiencies, and diseases are identified and detected at early stages. Hence, continuous health monitoring of plant is very crucial for handling plant stress. The deep learning methods have proven its superior performances in the automated detection of plant diseases and nutrition deficiencies from visual symptoms in leaves. This article proposes a new deep learning method for plant nutrition deficiencies and disease classification using a graph convolutional network (GNN), added upon a base convolutional neural network (CNN). Sometimes, a global feature descriptor might fail to capture the vital region of a diseased leaf, which causes inaccurate classification of disease. To address this issue, regional feature learning is crucial for a holistic feature aggregation. In this work, region-based feature summarization at multi-scales is explored using spatial pyramidal pooling for discriminative feature representation. Furthermore, a GCN is developed to capacitate learning of finer details for classifying plant diseases and insufficiency of nutrients. The proposed method, called Plant Nutrition Deficiency and Disease Network (PND-Net), has been evaluated on two public datasets for nutrition deficiency, and two for disease classification using four backbone CNNs. The best classification performances of the proposed PND-Net are as follows: (a) 90.00% Banana and 90.54% Coffee nutrition deficiency; and (b) 96.18% Potato diseases and 84.30% on PlantDoc datasets using Xception backbone. Furthermore, additional experiments have been carried out for generalization, and the proposed method has achieved state-of-the-art performances on two public datasets, namely the Breast Cancer Histopathology Image Classification (BreakHis 40: 95.50%, and BreakHis 100: 96.79% accuracy) and Single cells in Pap smear images for cervical cancer classification (SIPaKMeD: 99.18% accuracy). Also, the proposed method has been evaluated using five-fold cross validation and achieved improved performances on these datasets. Clearly, the proposed PND-Net effectively boosts the performances of automated health analysis of various plants in real and intricate field environments, implying PND-Net’s aptness for agricultural growth as well as human cancer classification.
Poa-net: dance poses and activity classification using convolutional neural networks
(IEEE, 2024) Bera, Asish
Dance poses represent a complex human body-part movement, and express emotions and gesture. Dance pose classification is a challenging problem in computer vision. Convolutional Neural Networks (CNNs) have witnessed significant performance improvements in recognizing dance poses from images and videos. Most of the dance datasets in existing works are video-based and are not available publicly. This work contributes an image dataset representing 8 new dance styles blended with the Indian and international dance themes, called Dance-8. These unique 8 dance styles are combined with the Dance-12 public dataset for improving the posture diversity and dataset size. This extended dataset is called Dance-20. A custom CNN is developed for dance POses and Activity classification, named POA-Net. All three dance datasets have been evaluated using standard base CNNs and POA-Net. The POA-Net has attained an accuracy of 73.27% on Dance-8, 82.10% on Dance-12, and 73.10% on Dance-20. These performances are better than those of standard backbones, such as VGG16 and Inception-V3. The best accuracy of 81.57%, 85.08% and 76.73% has been achieved by MobileNet-v2 on these Dance-8, 12, and 20 datasets, respectively. Moreover, POA-Net has achieved the state-of-the-art accuracy of 99.74% on the DIAT, which is a radar-based human action image dataset