<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel rdf:about="http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/1928">
<title>Department of Computer Science and Information Systems</title>
<link>http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/1928</link>
<description/>
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19232"/>
<rdf:li rdf:resource="http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19231"/>
<rdf:li rdf:resource="http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19230"/>
<rdf:li rdf:resource="http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19229"/>
</rdf:Seq>
</items>
<dc:date>2026-04-21T23:31:53Z</dc:date>
</channel>
<item rdf:about="http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19232">
<title>A multi-modal attentive framework that can interpret text (MMAT)</title>
<link>http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19232</link>
<description>A multi-modal attentive framework that can interpret text (MMAT)
Sharma, Yashvardhan
Deep learning algorithms have demonstrated exceptional performance on various computer vision and natural language processing tasks. However, for machines to learn information signals, they must understand and have enough reasoning power to respond to general questions based on the linguistic features present in images. Questions such as “What temperature is my oven set to?” need the models to understand objects in the images visually and then spatially identify the text associated with them. The existing Visual Question Answering model fails to recognize linguistic features present in the images, which is crucial for assisting the visually impaired. This paper aims to deal with the task of a visual question answering system that can do reasoning with text, optical character recognition (OCR), and visual modalities. The proposed Visual Question Answering model focuses on the image’s most relevant part by using an attention mechanism and passing all the features to the fusion encoder after getting pairwise attention, where the model is inclined toward the OCR-Linguistic features. The proposed model uses the dynamic pointer network instead of classification for iterative answer prediction with a focal loss function to overcome the class imbalance problem. On the TextVQA dataset, the proposed model obtains an accuracy of 46.8% and an average of 55.21% on the STVQA dataset. The results indicate the effectiveness of the proposed approach and suggest a Multi-Modal Attentive Framework that can learn individual text, object, and OCR features and then predict answers based on the text in the image.
</description>
<dc:date>2025-07-01T00:00:00Z</dc:date>
</item>
<item rdf:about="http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19231">
<title>Control-data plane intelligence trade-off in SDN</title>
<link>http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19231</link>
<description>Control-data plane intelligence trade-off in SDN
Sinha, Yash
With the decoupling of network control and data&#13;
planes, the upcoming Software Defined Networking (SDN)&#13;
paradigm advocates better network control and manageability. It&#13;
introduces logical centralized control, network programmability&#13;
and abstraction of underlying infrastructure from network&#13;
services and applications. With global visibility of network state&#13;
and central control that eases real time monitoring, policy&#13;
alterations etc., it certainly enhances network security inherently.&#13;
However, the separation of planes opens up new challenges like&#13;
denial of service (DoS) attack, saturation attack, man-in-the&#13;
middle attack and so on.&#13;
Many of the issues of controller availability, controller-switch&#13;
communication delay and scalability can be solved separately by&#13;
distributed controllers, out-of-band communication links and&#13;
parallelization respectively. Control-data plane intelligence&#13;
trade-off has the potential to solve all of these. It increases&#13;
controller availability, reduces latency for traffic engineering &amp;&#13;
decision making, and improves controller scalability. Moreover,&#13;
control-data plane intelligence trade-off enables the control-data&#13;
plane communication to be more secure. This will tremendously&#13;
offload the processing load on the controller. We present how to&#13;
realize control-data plane intelligence tradeoff extending&#13;
OpenFlow.
</description>
<dc:date>2025-01-01T00:00:00Z</dc:date>
</item>
<item rdf:about="http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19230">
<title>Comparative study of preprocessing and classification methods in character recognition of natural scene images</title>
<link>http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19230</link>
<description>Comparative study of preprocessing and classification methods in character recognition of natural scene images
Sinha, Yash
This paper presents an approach to character recognition in natural scene images. Recognizing such text is a challenging problem in the field of Computer Vision, more than the recognition of scanned documents due to several reasons. We propose a classification technique for classifying characters based on a pipeline of image processing operations and ensemble machine learning techniques. This pipeline tackles problems where Optical Character Recognition (OCR) fails. We present a framework that comprises a sequence of operations such as resizing, grey scaling, thresholding, morphological opening and median filtering on the images to handle background clutter, noise, multi-sized and multi-oriented characters and variance in illumination. We used image pixels and HOG (Histogram of Oriented Gradients) as features to train three different models based on Nearest-Neighbour, Random Forest and Extra Tree classifiers. When the input images were pre-processed, HOG features were extracted and fed into extra tree classifier, and the model classified the characters with maximum accuracy, among the other models that we tested. The proposed steps have been experimentally proven to yield better accuracy than the present state-of-the-art classification techniques on the Chars74k dataset. In addition, the paper includes a comparative study elaborating on various image processing operations, feature extraction methods and classification techniques.
</description>
<dc:date>2025-01-01T00:00:00Z</dc:date>
</item>
<item rdf:about="http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19229">
<title>Studying the role of kinect as a multi-sensory learning platform for children</title>
<link>http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/19229</link>
<description>Studying the role of kinect as a multi-sensory learning platform for children
Sinha, Yash
According to the theory of Embodied Cognition, our behavior is a result of real-time interaction with surroundings, our cognitive skills, and the nervous system. From this perspective, researchers are considering a learning environment which promotes physical activities to achieve cognitive tasks. Such Natural User Interfaces (NUI) make use of gesture-based sensors like the Microsoft Kinect. Yet we lack in-depth studies of how they improve the learning process. In this paper, we present observations of two deployment studies which focus on different roles that NUI can play as a part of learning activities. We deploy the Kinect based applications:- Yoga Soft: A Digital Yoga Instructor and Mudra: A Kinect based Learning System in real life scenarios. The first study is conducted at residences of preadolescent children in Gurgaon, India. The second study is conducted at an education center specializing in the care of kindergarten children in Pilani, India.
</description>
<dc:date>2018-01-01T00:00:00Z</dc:date>
</item>
</rdf:RDF>
