A detailed comparative analysis of automatic neural metrics for machine translation: bleurt & bertscore

Chamola, Vinay; Gupta, Karunesh Kumar

DSpace Home
→
BITS Faculty Publications
→
Department of Electrical and Electronics Engineering
→
View Item

dc.contributor.author	Chamola, Vinay
dc.contributor.author	Gupta, Karunesh Kumar
dc.date.accessioned	2025-05-20T09:11:07Z
dc.date.available	2025-05-20T09:11:07Z
dc.date.issued	2025-04
dc.identifier.uri	https://ieeexplore.ieee.org/abstract/document/10964149
dc.identifier.uri	http://dspace.bits-pilani.ac.in:8080/jspui/handle/123456789/18959
dc.description.abstract	Bleurt a recently introduced metric that employs Bert, a potent pre-trained language model to assess how well candidate translations compare to a reference translation in the context of machine translation outputs. While traditional metrics like Bleu rely on lexical similarities, Bleurt leverages Bert's semantic and syntactic capabilities to provide more robust evaluation through complex text representations. However, studies have shown that Bert, despite its impressive performance in natural language processing tasks can sometimes deviate from human judgment, particularly in specific syntactic and semantic scenarios. Through systematic experimental analysis at the word level, including categorization of errors such as lexical mismatches, untranslated terms, and structural inconsistencies, we investigate how Bleurt handles various translation challenges. Our study addresses three central questions: What are the strengths and weaknesses of Bleurt, how do they align with Bert's known limitations, and how does it compare with the similar automatic neural metric for machine translation, BERTScore? Using manually annotated datasets that emphasize different error types and linguistic phenomena, we find that Bleurt excels at identifying nuanced differences between sentences with high overlap, an area where BERTScore shows limitations. Our systematic experiments, provide insights for their effective application in machine translation evaluation.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.subject	EEE	en_US
dc.subject	Natural Language Processing (NLP)	en_US
dc.subject	Deep learning	en_US
dc.subject	Machine learning (ML)	en_US
dc.subject	Metrics	en_US
dc.title	A detailed comparative analysis of automatic neural metrics for machine translation: bleurt & bertscore	en_US
dc.type	Article	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Department of Electrical and Electronics Engineering [2012]

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

A detailed comparative analysis of automatic neural metrics for machine translation: bleurt & bertscore

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account