Multimodal Neural Machine Translation Using Synthetic Images Transformed by Latent Diffusion Model

Ryoya Yuasa, Akihiro Tamura, Tomoyuki Kajiwara, Takashi Ninomiya, Tsuneo Kato


Abstract
This study proposes a new multimodal neural machine translation (MNMT) model using synthetic images transformed by a latent diffusion model. MNMT translates a source language sentence based on its related image, but the image usually contains noisy information that are not relevant to the source language sentence. Our proposed method first generates a synthetic image corresponding to the content of the source language sentence by using a latent diffusion model and then performs translation based on the synthetic image. The experiments on the English-German translation tasks using the Multi30k dataset demonstrate the effectiveness of the proposed method.
Anthology ID:
2023.acl-srw.12
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Vishakh Padmakumar, Gisela Vallejo, Yao Fu
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
76–82
Language:
URL:
https://aclanthology.org/2023.acl-srw.12
DOI:
10.18653/v1/2023.acl-srw.12
Bibkey:
Cite (ACL):
Ryoya Yuasa, Akihiro Tamura, Tomoyuki Kajiwara, Takashi Ninomiya, and Tsuneo Kato. 2023. Multimodal Neural Machine Translation Using Synthetic Images Transformed by Latent Diffusion Model. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 76–82, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Multimodal Neural Machine Translation Using Synthetic Images Transformed by Latent Diffusion Model (Yuasa et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-srw.12.pdf