Visual Question-Answering System Using Integrated Models of Image Captioning and BERT

No Thumbnail Available

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Taylor & Francis

Abstract

Visual question and answering (VQA) is a task that involves taking input as an image and a natural question about it to generate output of an answer to that question. This is a multidisciplinary problem: it includes problems faced in computer vision and natural language processing. This chapter uses a combination of network architectures of question answering (BERT) and image captioning (BUTD, show-and-tell model, CaptionBot, and show, attend, and tell model) models for VQA tasks. The chapter also highlights the comparison between these four VQA models.

Description

Keywords

Computer Science, Visual question and answering (VQA), BERT, BUTD

Citation

Endorsement

Review

Supplemented By

Referenced By