Visual Question-Answering System Using Integrated Models of Image Captioning and BERT
No Thumbnail Available
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Taylor & Francis
Abstract
Visual question and answering (VQA) is a task that involves taking input as an image and a natural question about it to generate output of an answer to that question. This is a multidisciplinary problem: it includes problems faced in computer vision and natural language processing. This chapter uses a combination of network architectures of question answering (BERT) and image captioning (BUTD, show-and-tell model, CaptionBot, and show, attend, and tell model) models for VQA tasks. The chapter also highlights the comparison between these four VQA models.
Description
Keywords
Computer Science, Visual question and answering (VQA), BERT, BUTD