A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification

Nishikawa, Sosuke; Yamada, Ikuya; Tsuruoka, Yoshimasa; Echizen, Isao

Computer Science > Computation and Language

arXiv:2110.07792 (cs)

[Submitted on 15 Oct 2021 (v1), last revised 11 Oct 2022 (this version, v2)]

Title:A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification

Authors:Sosuke Nishikawa, Ikuya Yamada, Yoshimasa Tsuruoka, Isao Echizen

View PDF

Abstract:We present a multilingual bag-of-entities model that effectively boosts the performance of zero-shot cross-lingual text classification by extending a multilingual pre-trained language model (e.g., M-BERT). It leverages the multilingual nature of Wikidata: entities in multiple languages representing the same concept are defined with a unique identifier. This enables entities described in multiple languages to be represented using shared embeddings. A model trained on entity features in a resource-rich language can thus be directly applied to other languages. Our experimental results on cross-lingual topic classification (using the MLDoc and TED-CLDC datasets) and entity typing (using the SHINRA2020-ML dataset) show that the proposed model consistently outperforms state-of-the-art models.

Comments:	Accepted to CoNLL 2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2110.07792 [cs.CL]
	(or arXiv:2110.07792v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2110.07792

Submission history

From: Sosuke Nishikawa [view email]
[v1] Fri, 15 Oct 2021 01:10:50 UTC (915 KB)
[v2] Tue, 11 Oct 2022 10:46:13 UTC (737 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-10

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ikuya Yamada
Yoshimasa Tsuruoka
Isao Echizen

export BibTeX citation

Computer Science > Computation and Language

Title:A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators