BITS Faculty Publications
Permanent URI for this communityhttp://localhost:4000/handle/123456789/1867
Browse
12 results
Search Results
Item Advancements in Yoga Pose Estimation Using Artificial Intelligence: A Survey(Bentham Science, 2024) Chamola, Vinay; Rout, Bijay KumarHuman pose estimation has been a prevalent field of computer vision and sensing study. In recent years, it has made many advances that have helped humanity in the fields of sports, surveillance, healthcare, etc. Yoga is an ancient science intended to improve physical, mental and spiritual wellbeing. It involves many kinds of asanas or postures that a practitioner can perform. Thus, the benefits of pose estimation can also be used for Yoga to help users assume Yoga postures with better accuracy. The Yoga practitioner can detect their own current posture in real-time, and the pose estimation method can provide them with corrective feedback if they commit mistakes. Yoga pose estimation can also help with remote Yoga instruction by the expert teacher, which can be a boon during a pandemic. This paper reviews various Machine Learning, Artificial Intelligence-enabled techniques available for real-time pose estimation and research pursued recently. We classify them based on the input they use for estimating the individual's pose. We also discuss multiple Yoga posture estimation systems in detail. We discuss the most commonly used keypoint estimation techniques in the existing literature. In addition to this, we discuss the real-time performance of the presented works. The paper further discusses the datasets and evaluation metrics available for pose estimation.Item Overtaking Mechanisms Based on Augmented Intelligence for Autonomous Driving: Data Sets, Methods, and Challenges(IEEE, 2024-04) Chamola, VinayThe field of autonomous driving research has made significant strides toward achieving full automation, endowing vehicles with self-awareness and independent decision making. However, integrating automation into vehicular operations presents formidable challenges, especially as these vehicles must seamlessly navigate public roads alongside other cars and pedestrians. An intriguing yet relatively underexplored domain within autonomous driving is overtaking. Overtaking involves a dynamic interplay of complex tasks, including precise steering and speed control, rendering it one of the most intricate operations for implementing augmented intelligence driving technologies. Surprisingly, the overtaking of autonomous vehicles (AVs) remains largely uncharted territory in the context of augmented intelligence for autonomous systems. This void in knowledge beckons researchers to embark on explorations and investigations in this nascent field. Our review paper systematically synthesises overtaking methodologies hinging on computer vision techniques tailored for augmented intelligence autonomous driving scenarios in response to this pressing need. Our analysis encompasses an array of domains central to overtaking in augmented intelligence AVs, encompassing Object Detection, Lane/Line Detection, Depth Estimation, Obstacle Detection, Segmentation, and Pedestrian Detection. We meticulously analyze each domain using well-established multimodal data sets. We assess different models’ performance across various parameters by employing graphical structures, enabling visual comparative analyses. In object detection, YOLOv4 achieves a top performance with 0.90 mAP on the BDD100K data set. For lane detection, CLRNET excels with the highest F1 score of around 0.96 on the LLAMAS data setItem Visual Question Answering Analysis: Datasets, Methods, and Image Featurization Techniques(ICPRAM, 2023) Sharma, YashvardhanHolistic scene understanding is a long-standing objective of core tenets of Artificial Intelligence (AI). Multimodal tasks that aim to synergize capabilities spanning multiple domains, such as visual-linguistic capabilities, into intelligent systems are thus a desideratum for the next step in AI. Visual Question Answering (VQA) systems that integrate Computer Vision and Natural Language Processing tasks into the task of answering natural language questions about an image represent one such domain. There is a need to explore Deep Learning techniques that can help to improve such systems beyond the language biases of real-world priors that presently hinder them from serving as a veritable touchstone for holistic scene understanding. Furthermore, the effectiveness of Transformer architecture for the image featurization pipeline of VQA systems remains untested. Hence, an exhaustive study on the performance of various model architectures with varied training conditions on VQA datasets like VizWiz and VQA v2 is imperative to further this area of research. This study explores architectures that utilize image and question co-attention for the task of VQA and several CNN architectures, including ResNet, VGG, EfficientNet, and DenseNet. Vision Transformer architecture is also explored for image featurization, and a myriad of loss functions such as cross-entropy, focal loss, and UniLoss are employed for training the models. Finally, the trained model is deployed using Flask, and a GUI for the same has been implemented that lets users input an image and accompanying questions about the image to generate an answer in response.Item Comparative Study of Convolutional Neural Network Object Detection Algorithms for Image Processing(IEEE, 2023) Singh, NavinThis paper presents a comparative study on three Convolutional Neural Network (CNN) object detection algorithms to find the best detector based on the combination of speed and accuracy on a personal computer. The MATLAB® development environment is used to evaluate three different object detector algorithms, namely Faster Region-Based Convolutional Network (R-CNN), Single Shot Detector (SSD) and You Only Look Once (YOLO). These algorithms are trained, and their performance metrics are tested on a small sample dataset. The results show that the SSD object detector algorithm performs best when considering both performance and processing speeds. Faster R-CNN detected objects at an average speed of 4.838 seconds and achieved a mean average precision of 0.76 with an average loss of 0.429. SSD detected objects at an average speed of 0.377 seconds and achieved a mean average precision of 0.92 with an average loss of 1.754. YOLO v3 detected objects at an average speed of 1.004 seconds and achieved a mean average precision of 0.81 with an average loss of 2.739.Item Autonomous Classification and Spatial Location of Objects from Stereoscopic Image Sequences for the Visually Impaired(i, 2022) Singh, NavinOne of the main problems faced by visually impaired individuals is the inability or difficulty to identify objects. A visually impaired person usually wears glasses that help to enlarge or focus on nearby objects, and therefore heavily relies on physical touch to identify an object. There are challenges when walking on the road or navigating to a specific location since the vision is lost or reduced thereby increasing the risk of an accident. This paper proposes a simple portable machine vision system for assisting the visually impaired by providing auditory feedback of nearby objects in real-time. The proposed system consists of three main hardware components consisting of a single board computer, a wireless camera, and an earpiece module. YOLACT object detection library was used to detect objects from the captured image. The objects are converted to an audio signal using the Festival Speech Synthesis System. Experimental results show that the system is efficient and capable of providing audio feedback of detected objects to the visually impaired person in real-time.Item Hyper-parameter Optimization on Viola Jones Algorithm for Gesture Recognition(Springer, 2020-07) Rout, Bijay KumarThe problem of features, objects, gestures, and face detection has been tackled using a numerous vision-based algorithms available in literature. Each of these algorithms requires a set of hyper-parameters, which need to be set on the basis of trial and error such that the results provide best performance to a situation. Mostly, researchers use trial and error approach to satisfactory result and solve the above problems. In this work, an approach has been suggested to determine an optimum set of hyper-parameters, which will provide a starting point for anyone using Viola Jones algorithm for hand gesture recognition or similar endeavors. This will reduce the time spent in searching for the best combination of hyper-parameters.Item NDENet: End-to-End Nighttime Dehazing and Enhancement(World Academy of Science, Engineering and Technology, 2007-01) Rout, Bijay KumarIn this paper, we present a computer vision task called nighttime dehaze-enhancement. This task aims to jointly perform dehazing and lightness enhancement. Our task fundamentally differs from nighttime dehazing – our goal is to jointly dehaze and enhance scenes, while nighttime dehazing aims to dehaze scenes under a nighttime setting. In order to facilitate further research on this task, we release a benchmark dataset called Reside-β Night dataset, consisting of 4122 nighttime hazed images from 2061 scenes and 2061 ground truth images. Moreover, we also propose a network called NDENet (Nighttime Dehaze-Enhancement Network), which jointly performs dehazing and low-light enhancement in an end-to-end manner. We evaluate our method on the proposed benchmark and achieve Structural Index Similarity (SSIM) of 0.8962 and Peak Signal to Noise Ratio (PSNR) of 26.25. We also compare our network with other baseline networks on our benchmark to demonstrate the effectiveness of our approach. We believe that nighttime dehaze-enhancement is an essential task particularly for autonomous navigation applications, and hope that our work will open up new frontiers in research. The code for our network is made publicly available.Item A Computer Vision Assisted Yoga Trainer for a Naive Performer by Using Human Joint Detection(Springer, 2023-05) Sangwan, Kuldip SinghIt is a well-known proverb that a healthy mind lives in a healthy body, and yoga is one such means for connecting the body to the mind. However, yoga should be performed under professional supervision and in a regulated manner, as it can be harmful to one's health if done incorrectly. Moreover, it is difficult for beginners to identify the incorrect portions of their yoga postures on their own. In this research article, we present a user-friendly python-flask based web application that assists its registered users to perform every pose accurately. We have used computer vision techniques as it can perform various visual data frame related operations in real time. Our method consists of two main components: a hand gesture component that records video using hand gestures and a pose estimation component that detects body joint coordinates. The system then compares the angles obtained from the instructor's pose and the users for feedback generation and provides correction if the difference is larger than a certain threshold. With this inherent capability of pose feedback generation, the proposed system thus enables the naive performers to evaluate their poses and correct it when it deviates from the correct pose sequence. The method was evaluated in real-time on people of varied age groups and gender for four different asanas, and it was proven that it recognizes incorrect portions of the performed asanas for all the test cases. The experimental findings in terms of feedback generated using the user videos gave a functional validation of the proposed procedure and its usability in modern day human life.Item Applications of fractional calculus in computer vision: A survey(Elsevier, 2022-06) Agarwal, Shivi; Mathur, TrilokFractional calculus is an abstract idea exploring interpretations of differentiation having non-integer order. For a very long time, it was considered as a topic of mere theoretical interest. However, the introduction of several useful definitions of fractional derivatives has extended its domain to applications. Supported by computational power and algorithmic representations, fractional calculus has emerged as a multifarious domain. It has been found that the fractional derivatives are capable of incorporating memory into the system and thus suitable to improve the performance of locality-aware tasks such as image processing and computer vision in general. This article presents an extensive survey of fractional-order derivative-based techniques that are used in computer vision. It briefly introduces the basics and presents applications of the fractional calculus in six different domains viz. edge detection, optical flow, image segmentation, image de-noising, image recognition, and object detection. The fractional derivatives ensure noise resilience and can preserve both high and low-frequency components of an image. The relative similarity of neighboring pixels can get affected by an error, noise, or non–homogeneous illumination in an image. In that case, the fractional differentiation can model special similarities and help compensate for the issue suitably. The fractional derivatives can be evaluated for discontinuous functions, which help estimate discontinuous optical flow. The order of the differentiation also provides an additional degree of freedom in the optimization process. This study shows the successful implementations of fractional calculus in computer vision and contributes to bringing out challenges and future scopes.Item Cherry Plucking Strategies for Coffee Harvester(IEEE, 2021) Shenoy, Meetha V.Coffee is one of the major agricultural produce popular worldwide. Coffee harvesting is performed in two ways (a) Selective harvesting - in which only ripe coffee cherries are picked, leaving the unripe coffee cherries intact. (b) Strip harvesting - in which the cherries are stripped out without separation of ripe and unripe ones. Although this can be completed quickly, this results in higher percentages of unripe, which reduce the quality and sale value, resulting in less profit for producers. The selective coffee cherry harvester should identify and distinguish ripe and unripe cherries and hence a fully automated harvesting system should be vision-guided. The design of developing a vision-based harvesting system for coffee cherries is particularly difficult due to the size of the coffee cherries, the clustered arrangement of the coffee cherries, and the height of the coffee plant. Currently available harvesters are based on strip harvesting and hence there is a need to develop harvestors for selective harvesting of coffee cherries. In this work, we present cherry plucking strategies for a selective coffee harvester robot. This analysis is one of the key work required towards the implementation of the vision guided-selective harvester. The proposed work is tested in simulation as well as on hardware consisting of Interbotix ReactorX150 robot arm and Intel Realsense 435i camera.