Please use this identifier to cite or link to this item:
http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16373
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Sharma, Yashvardhan | - |
dc.date.accessioned | 2024-11-14T09:22:46Z | - |
dc.date.available | 2024-11-14T09:22:46Z | - |
dc.date.issued | 2021 | - |
dc.identifier.uri | https://www.taylorfrancis.com/chapters/edit/10.1201/9781003102380-9/visual-question-answering-system-using-integrated-models-image-captioning-bert-lavika-goel-mohit-dhawan-rachit-rathore-satyansh-rai-aaryan-kapoor-yashvardhan-sharma | - |
dc.identifier.uri | http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16373 | - |
dc.description.abstract | Visual question and answering (VQA) is a task that involves taking input as an image and a natural question about it to generate output of an answer to that question. This is a multidisciplinary problem: it includes problems faced in computer vision and natural language processing. This chapter uses a combination of network architectures of question answering (BERT) and image captioning (BUTD, show-and-tell model, CaptionBot, and show, attend, and tell model) models for VQA tasks. The chapter also highlights the comparison between these four VQA models. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Taylor & Francis | en_US |
dc.subject | Computer Science | en_US |
dc.subject | Visual question and answering (VQA) | en_US |
dc.subject | BERT | en_US |
dc.subject | BUTD | en_US |
dc.title | Visual Question-Answering System Using Integrated Models of Image Captioning and BERT | en_US |
dc.type | Article | en_US |
Appears in Collections: | Department of Computer Science and Information Systems |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.