Visual Question-Answering System Using Integrated Models of Image Captioning and BERT

Please use this identifier to cite or link to this item: http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16373

Full metadata record

DC Field	Value	Language
dc.contributor.author	Sharma, Yashvardhan	-
dc.date.accessioned	2024-11-14T09:22:46Z	-
dc.date.available	2024-11-14T09:22:46Z	-
dc.date.issued	2021	-
dc.identifier.uri	https://www.taylorfrancis.com/chapters/edit/10.1201/9781003102380-9/visual-question-answering-system-using-integrated-models-image-captioning-bert-lavika-goel-mohit-dhawan-rachit-rathore-satyansh-rai-aaryan-kapoor-yashvardhan-sharma	-
dc.identifier.uri	http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16373	-
dc.description.abstract	Visual question and answering (VQA) is a task that involves taking input as an image and a natural question about it to generate output of an answer to that question. This is a multidisciplinary problem: it includes problems faced in computer vision and natural language processing. This chapter uses a combination of network architectures of question answering (BERT) and image captioning (BUTD, show-and-tell model, CaptionBot, and show, attend, and tell model) models for VQA tasks. The chapter also highlights the comparison between these four VQA models.	en_US
dc.language.iso	en	en_US
dc.publisher	Taylor & Francis	en_US
dc.subject	Computer Science	en_US
dc.subject	Visual question and answering (VQA)	en_US
dc.subject	BERT	en_US
dc.subject	BUTD	en_US
dc.title	Visual Question-Answering System Using Integrated Models of Image Captioning and BERT	en_US
dc.type	Article	en_US
Appears in Collections:	Department of Computer Science and Information Systems

Files in This Item:

There are no files associated with this item.