Visual Question-Answering System Using Integrated Models of Image Captioning and BERT

Sharma, Yashvardhan

Visual Question-Answering System Using Integrated Models of Image Captioning and BERT

Date

2021

Authors

Sharma, Yashvardhan

Publisher

Taylor & Francis

Abstract

Visual question and answering (VQA) is a task that involves taking input as an image and a natural question about it to generate output of an answer to that question. This is a multidisciplinary problem: it includes problems faced in computer vision and natural language processing. This chapter uses a combination of network architectures of question answering (BERT) and image captioning (BUTD, show-and-tell model, CaptionBot, and show, attend, and tell model) models for VQA tasks. The chapter also highlights the comparison between these four VQA models.

Keywords

Computer Science, Visual question and answering (VQA), BERT, BUTD

URI

https://www.taylorfrancis.com/chapters/edit/10.1201/9781003102380-9/visual-question-answering-system-using-integrated-models-image-captioning-bert-lavika-goel-mohit-dhawan-rachit-rathore-satyansh-rai-aaryan-kapoor-yashvardhan-sharma
https://dspace.bits-pilani.ac.in/handle/123456789/16373

Collections

Department of Computer Science and Information Systems

Full item page

Visual Question-Answering System Using Integrated Models of Image Captioning and BERT

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By