Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

Wei-Jen Ko; Ahmed El-Kishky; Adithya Renduchintala; Vishrav Chaudhary; Naman Goyal; Francisco Guzmán; Pascale Fung; Philipp Koehn; Mona Diab

doi:10.18653/v1/2021.acl-long.66

Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab

Abstract

The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages. Fortunately, some low-resource languages are linguistically related or similar to high-resource languages; these related languages may share many lexical or syntactic structures. In this work, we exploit this linguistic overlap to facilitate translating to and from a low-resource language with only monolingual data, in addition to any parallel data in the related high-resource language. Our method, NMT-Adapt, combines denoising autoencoding, back-translation and adversarial objectives to utilize monolingual data for low-resource adaptation. We experiment on 7 languages from three different language families and show that our technique significantly improves translation into low-resource language compared to other translation baselines.

Anthology ID:: 2021.acl-long.66
Volume:: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:: August
Year:: 2021
Address:: Online
Editors:: Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:: ACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 802–812
Language:
URL:: https://aclanthology.org/2021.acl-long.66
DOI:: 10.18653/v1/2021.acl-long.66
Bibkey:
Cite (ACL):: Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, and Mona Diab. 2021. Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 802–812, Online. Association for Computational Linguistics.
Cite (Informal):: Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data (Ko et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.acl-long.66.pdf
Video:: https://aclanthology.org/2021.acl-long.66.mp4
Code: wjko2/NMT-Adapt
Data: CC100, FLoRes

PDF Cite Search Code Video