Abstract:
This paper describes the proposed system for mutlimodal machine translation. We have participated
in multimodal translation tasks for English into three Indic languages: Hindi, Bengali,
and Malayalam. We leverage the inherent richness of multimodal data to bridge the gap
of ambiguity in translation. We fine-tuned the ‘No Language Left Behind’ (NLLB) machine
translation model for multimodal translation, further enhancing the model accuracy by image
data augmentation using latent diffusion. Our submission achieves the best BLEU score for
English-Hindi, English-Bengali, and English-Malayalam language pairs for both Evaluation
and Challenge test sets.