Visual Question-Answering System Using Integrated Models of Image Captioning and BERT

dc.contributor.authorSharma, Yashvardhan
dc.date.accessioned2024-11-14T09:22:46Z
dc.date.available2024-11-14T09:22:46Z
dc.date.issued2021
dc.description.abstractVisual question and answering (VQA) is a task that involves taking input as an image and a natural question about it to generate output of an answer to that question. This is a multidisciplinary problem: it includes problems faced in computer vision and natural language processing. This chapter uses a combination of network architectures of question answering (BERT) and image captioning (BUTD, show-and-tell model, CaptionBot, and show, attend, and tell model) models for VQA tasks. The chapter also highlights the comparison between these four VQA models.en_US
dc.identifier.urihttps://www.taylorfrancis.com/chapters/edit/10.1201/9781003102380-9/visual-question-answering-system-using-integrated-models-image-captioning-bert-lavika-goel-mohit-dhawan-rachit-rathore-satyansh-rai-aaryan-kapoor-yashvardhan-sharma
dc.identifier.urihttps://dspace.bits-pilani.ac.in/handle/123456789/16373
dc.language.isoenen_US
dc.publisherTaylor & Francisen_US
dc.subjectComputer Scienceen_US
dc.subjectVisual question and answering (VQA)en_US
dc.subjectBERTen_US
dc.subjectBUTDen_US
dc.titleVisual Question-Answering System Using Integrated Models of Image Captioning and BERTen_US
dc.typeArticleen_US

Files

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: