A Comparative Analysis of Transformer-Based Models for Document Visual Question Answering

Sharma, Yashvardhan

DSpace Home
→
BITS Faculty Publications
→
Department of Computer Science and Information Systems
→
View Item

dc.contributor.author	Sharma, Yashvardhan
dc.date.accessioned	2024-11-13T08:55:55Z
dc.date.available	2024-11-13T08:55:55Z
dc.date.issued	2023-06
dc.identifier.uri	https://link.springer.com/chapter/10.1007/978-981-99-0609-3_16
dc.identifier.uri	http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/16357
dc.description.abstract	Visual question answering (VQA) is one of the most exciting problems of computer vision and natural language processing tasks. It requires understanding and reasoning of the image to answer a human query. Text Visual Question Answering (Text-VQA) and Document Visual Question Answering (DocVQA) are the two sub problems of the VQA, which require extracting the text from the usual scene and document images. Since answering questions about documents requires an understanding of the layout and writing patterns, the models that perform well on the Text-VQA task perform poorly on the DocVQA task. As the transformer-based models achieve state-of-the-art results in deep learning fields, we train and fine-tune various transformer-based models (such as BERT, ALBERT, RoBERTa, ELECTRA, and Distil-BERT) to examine their validation accuracy. This paper provides a detailed analysis of various transformer models and compares their accuracies on the DocVQA task.	en_US
dc.language.iso	en	en_US
dc.publisher	Springer	en_US
dc.subject	Computer Science	en_US
dc.subject	Visual Question Answering (VQA)	en_US
dc.subject	Text Visual Question Answering (Text-VQA)	en_US
dc.subject	Document Visual Question Answering (DocVQA)	en_US
dc.title	A Comparative Analysis of Transformer-Based Models for Document Visual Question Answering	en_US
dc.type	Article	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Department of Computer Science and Information Systems [1099]

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

A Comparative Analysis of Transformer-Based Models for Document Visual Question Answering

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account