Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings

Linlin Liu, Thien Hai Nguyen, Shafiq Joty, Lidong Bing, Luo Si


Abstract
Cross-lingual word embeddings (CLWE) have been proven useful in many cross-lingual tasks. However, most existing approaches to learn CLWE including the ones with contextual embeddings are sense agnostic. In this work, we propose a novel framework to align contextual embeddings at the sense level by leveraging cross-lingual signal from bilingual dictionaries only. We operationalize our framework by first proposing a novel sense-aware cross entropy loss to model word senses explicitly. The monolingual ELMo and BERT models pretrained with our sense-aware cross entropy loss demonstrate significant performance improvement for word sense disambiguation tasks. We then propose a sense alignment objective on top of the sense-aware cross entropy loss for cross-lingual model pretraining, and pretrain cross-lingual models for several language pairs (English to German/Spanish/Japanese/Chinese). Compared with the best baseline results, our cross-lingual models achieve 0.52%, 2.09% and 1.29% average performance improvements on zero-shot cross-lingual NER, sentiment classification and XNLI tasks, respectively.
Anthology ID:
2022.coling-1.386
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4381–4396
Language:
URL:
https://aclanthology.org/2022.coling-1.386
DOI:
Bibkey:
Cite (ACL):
Linlin Liu, Thien Hai Nguyen, Shafiq Joty, Lidong Bing, and Luo Si. 2022. Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4381–4396, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings (Liu et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.386.pdf
Code
 ntunlp/multisense_embedding_alignment
Data
Billion Word BenchmarkMultiNLI