On the Sensitivity and Stability of Model Interpretations in NLP

Yin, Fan; Shi, Zhouxing; Hsieh, Cho-Jui; Chang, Kai-Wei

Computer Science > Computation and Language

arXiv:2104.08782 (cs)

[Submitted on 18 Apr 2021 (v1), last revised 31 Mar 2022 (this version, v2)]

Title:On the Sensitivity and Stability of Model Interpretations in NLP

Authors:Fan Yin, Zhouxing Shi, Cho-Jui Hsieh, Kai-Wei Chang

View PDF

Abstract:Recent years have witnessed the emergence of a variety of post-hoc interpretations that aim to uncover how natural language processing (NLP) models make predictions. Despite the surge of new interpretation methods, it remains an open problem how to define and quantitatively measure the faithfulness of interpretations, i.e., to what extent interpretations reflect the reasoning process by a model. We propose two new criteria, sensitivity and stability, that provide complementary notions of faithfulness to the existed removal-based criteria. Our results show that the conclusion for how faithful interpretations are could vary substantially based on different notions. Motivated by the desiderata of sensitivity and stability, we introduce a new class of interpretation methods that adopt techniques from adversarial robustness. Empirical results show that our proposed methods are effective under the new criteria and overcome limitations of gradient-based methods on removal-based criteria. Besides text classification, we also apply interpretation methods and metrics to dependency parsing. Our results shed light on understanding the diverse set of interpretations.

Comments:	ACL 2022, long paper, main conference
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2104.08782 [cs.CL]
	(or arXiv:2104.08782v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2104.08782

Submission history

From: Fan Yin [view email]
[v1] Sun, 18 Apr 2021 09:19:44 UTC (4,603 KB)
[v2] Thu, 31 Mar 2022 19:43:18 UTC (4,561 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-04

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Fan Yin
Zhouxing Shi
Cho-Jui Hsieh
Kai-Wei Chang

export BibTeX citation

Computer Science > Computation and Language

Title:On the Sensitivity and Stability of Model Interpretations in NLP

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:On the Sensitivity and Stability of Model Interpretations in NLP

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators