Department of Computer Science and Information Systems

Department of Computer Science and Information Systems http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/1928 2026-05-12T02:12:44Z A multi-modal attentive framework that can interpret text (MMAT) http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19232 A multi-modal attentive framework that can interpret text (MMAT) Sharma, Yashvardhan Deep learning algorithms have demonstrated exceptional performance on various computer vision and natural language processing tasks. However, for machines to learn information signals, they must understand and have enough reasoning power to respond to general questions based on the linguistic features present in images. Questions such as “What temperature is my oven set to?” need the models to understand objects in the images visually and then spatially identify the text associated with them. The existing Visual Question Answering model fails to recognize linguistic features present in the images, which is crucial for assisting the visually impaired. This paper aims to deal with the task of a visual question answering system that can do reasoning with text, optical character recognition (OCR), and visual modalities. The proposed Visual Question Answering model focuses on the image’s most relevant part by using an attention mechanism and passing all the features to the fusion encoder after getting pairwise attention, where the model is inclined toward the OCR-Linguistic features. The proposed model uses the dynamic pointer network instead of classification for iterative answer prediction with a focal loss function to overcome the class imbalance problem. On the TextVQA dataset, the proposed model obtains an accuracy of 46.8% and an average of 55.21% on the STVQA dataset. The results indicate the effectiveness of the proposed approach and suggest a Multi-Modal Attentive Framework that can learn individual text, object, and OCR features and then predict answers based on the text in the image. 2025-07-01T00:00:00Z Control-data plane intelligence trade-off in SDN http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19231 Control-data plane intelligence trade-off in SDN Sinha, Yash With the decoupling of network control and data planes, the upcoming Software Defined Networking (SDN) paradigm advocates better network control and manageability. It introduces logical centralized control, network programmability and abstraction of underlying infrastructure from network services and applications. With global visibility of network state and central control that eases real time monitoring, policy alterations etc., it certainly enhances network security inherently. However, the separation of planes opens up new challenges like denial of service (DoS) attack, saturation attack, man-in-the middle attack and so on. Many of the issues of controller availability, controller-switch communication delay and scalability can be solved separately by distributed controllers, out-of-band communication links and parallelization respectively. Control-data plane intelligence trade-off has the potential to solve all of these. It increases controller availability, reduces latency for traffic engineering & decision making, and improves controller scalability. Moreover, control-data plane intelligence trade-off enables the control-data plane communication to be more secure. This will tremendously offload the processing load on the controller. We present how to realize control-data plane intelligence tradeoff extending OpenFlow. 2025-01-01T00:00:00Z Comparative study of preprocessing and classification methods in character recognition of natural scene images http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19230 Comparative study of preprocessing and classification methods in character recognition of natural scene images Sinha, Yash This paper presents an approach to character recognition in natural scene images. Recognizing such text is a challenging problem in the field of Computer Vision, more than the recognition of scanned documents due to several reasons. We propose a classification technique for classifying characters based on a pipeline of image processing operations and ensemble machine learning techniques. This pipeline tackles problems where Optical Character Recognition (OCR) fails. We present a framework that comprises a sequence of operations such as resizing, grey scaling, thresholding, morphological opening and median filtering on the images to handle background clutter, noise, multi-sized and multi-oriented characters and variance in illumination. We used image pixels and HOG (Histogram of Oriented Gradients) as features to train three different models based on Nearest-Neighbour, Random Forest and Extra Tree classifiers. When the input images were pre-processed, HOG features were extracted and fed into extra tree classifier, and the model classified the characters with maximum accuracy, among the other models that we tested. The proposed steps have been experimentally proven to yield better accuracy than the present state-of-the-art classification techniques on the Chars74k dataset. In addition, the paper includes a comparative study elaborating on various image processing operations, feature extraction methods and classification techniques. 2025-01-01T00:00:00Z Studying the role of kinect as a multi-sensory learning platform for children http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19229 Studying the role of kinect as a multi-sensory learning platform for children Sinha, Yash According to the theory of Embodied Cognition, our behavior is a result of real-time interaction with surroundings, our cognitive skills, and the nervous system. From this perspective, researchers are considering a learning environment which promotes physical activities to achieve cognitive tasks. Such Natural User Interfaces (NUI) make use of gesture-based sensors like the Microsoft Kinect. Yet we lack in-depth studies of how they improve the learning process. In this paper, we present observations of two deployment studies which focus on different roles that NUI can play as a part of learning activities. We deploy the Kinect based applications:- Yoga Soft: A Digital Yoga Instructor and Mudra: A Kinect based Learning System in real life scenarios. The first study is conducted at residences of preadolescent children in Gurgaon, India. The second study is conducted at an education center specializing in the care of kindergarten children in Pilani, India. 2018-01-01T00:00:00Z