The Origin and Value of Disagreement Among Data Labelers: A Case Study of the Individual Difference in Hate Speech Annotation

Sang, Yisi; Stanton, Jeffrey

Computer Science > Human-Computer Interaction

arXiv:2112.04030 (cs)

[Submitted on 7 Dec 2021]

Title:The Origin and Value of Disagreement Among Data Labelers: A Case Study of the Individual Difference in Hate Speech Annotation

Authors:Yisi Sang, Jeffrey Stanton

View PDF

Abstract:Human annotated data is the cornerstone of today's artificial intelligence efforts, yet data labeling processes can be complicated and expensive, especially when human labelers disagree with each other. The current work practice is to use majority-voted labels to overrule the disagreement. However, in the subjective data labeling tasks such as hate speech annotation, disagreement among individual labelers can be difficult to resolve. In this paper, we explored why such disagreements occur using a mixed-method approach - including interviews with experts, concept mapping exercises, and self-reporting items - to develop a multidimensional scale for distilling the process of how annotators label a hate speech corpus. We tested this scale with 170 annotators in a hate speech annotation task. Results showed that our scale can reveal facets of individual differences among annotators (e.g., age, personality, etc.), and these facets' relationships to an annotator's final label decision of an instance. We suggest that this work contributes to the understanding of how humans annotate data. The proposed scale can potentially improve the value of the currently discarded minority-vote labels.

Subjects:	Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2112.04030 [cs.HC]
	(or arXiv:2112.04030v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2112.04030

Submission history

From: Yisi Sang [view email]
[v1] Tue, 7 Dec 2021 22:54:49 UTC (367 KB)

Computer Science > Human-Computer Interaction

Title:The Origin and Value of Disagreement Among Data Labelers: A Case Study of the Individual Difference in Hate Speech Annotation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:The Origin and Value of Disagreement Among Data Labelers: A Case Study of the Individual Difference in Hate Speech Annotation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators