Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER

Li, Peng-Hsuan; Fu, Tsu-Jui; Ma, Wei-Yun

Computer Science > Computation and Language

arXiv:1908.11046 (cs)

[Submitted on 29 Aug 2019 (v1), last revised 3 Jul 2020 (this version, v3)]

Title:Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER

Authors:Peng-Hsuan Li, Tsu-Jui Fu, Wei-Yun Ma

View PDF

Abstract:BiLSTM has been prevalently used as a core module for NER in a sequence-labeling setup. State-of-the-art approaches use BiLSTM with additional resources such as gazetteers, language-modeling, or multi-task supervision to further improve NER. This paper instead takes a step back and focuses on analyzing problems of BiLSTM itself and how exactly self-attention can bring improvements. We formally show the limitation of (CRF-)BiLSTM in modeling cross-context patterns for each word -- the XOR limitation. Then, we show that two types of simple cross-structures -- self-attention and Cross-BiLSTM -- can effectively remedy the problem. We test the practical impacts of the deficiency on real-world NER datasets, OntoNotes 5.0 and WNUT 2017, with clear and consistent improvements over the baseline, up to 8.7% on some of the multi-token entity mentions. We give in-depth analyses of the improvements across several aspects of NER, especially the identification of multi-token mentions. This study should lay a sound foundation for future improvements on sequence-labeling NER. (Source codes: this https URL)

Comments:	In proceedings of AAAI 2020
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1908.11046 [cs.CL]
	(or arXiv:1908.11046v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1908.11046

Submission history

From: Peng-Hsuan Li [view email]
[v1] Thu, 29 Aug 2019 04:36:30 UTC (1,518 KB)
[v2] Tue, 12 Nov 2019 10:08:43 UTC (1,726 KB)
[v3] Fri, 3 Jul 2020 08:27:20 UTC (1,528 KB)

Computer Science > Computation and Language

Title:Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators