Hierarchical Evaluation Framework: Best Practices for Human Evaluation

Bojic, Iva; Chen, Jessica; Chang, Si Yuan; Ong, Qi Chwen; Joty, Shafiq; Car, Josip

Computer Science > Computation and Language

arXiv:2310.01917 (cs)

[Submitted on 3 Oct 2023 (v1), last revised 12 Oct 2023 (this version, v2)]

Title:Hierarchical Evaluation Framework: Best Practices for Human Evaluation

Authors:Iva Bojic, Jessica Chen, Si Yuan Chang, Qi Chwen Ong, Shafiq Joty, Josip Car

View PDF

Abstract:Human evaluation plays a crucial role in Natural Language Processing (NLP) as it assesses the quality and relevance of developed systems, thereby facilitating their enhancement. However, the absence of widely accepted human evaluation metrics in NLP hampers fair comparisons among different systems and the establishment of universal assessment standards. Through an extensive analysis of existing literature on human evaluation metrics, we identified several gaps in NLP evaluation methodologies. These gaps served as motivation for developing our own hierarchical evaluation framework. The proposed framework offers notable advantages, particularly in providing a more comprehensive representation of the NLP system's performance. We applied this framework to evaluate the developed Machine Reading Comprehension system, which was utilized within a human-AI symbiosis model. The results highlighted the associations between the quality of inputs and outputs, underscoring the necessity to evaluate both components rather than solely focusing on outputs. In future work, we will investigate the potential time-saving benefits of our proposed framework for evaluators assessing NLP systems.

Subjects:	Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2310.01917 [cs.CL]
	(or arXiv:2310.01917v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.01917

Submission history

From: Qi Chwen Ong [view email]
[v1] Tue, 3 Oct 2023 09:46:02 UTC (6,405 KB)
[v2] Thu, 12 Oct 2023 07:59:56 UTC (6,405 KB)

Computer Science > Computation and Language

Title:Hierarchical Evaluation Framework: Best Practices for Human Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Hierarchical Evaluation Framework: Best Practices for Human Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators