Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity

Fandina, Ora Nova; Choshen, Leshem; Farchi, Eitan; Kour, George; Perlitz, Yotam; Raz, Orna

Computer Science > Artificial Intelligence

arXiv:2408.12259 (cs)

[Submitted on 22 Aug 2024]

Title:Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity

Authors:Ora Nova Fandina, Leshem Choshen, Eitan Farchi, George Kour, Yotam Perlitz, Orna Raz

View PDF HTML (experimental)

Abstract:Consider a scenario where a harmfulness detection metric is employed by a system to filter unsafe responses generated by a Large Language Model. When analyzing individual harmful and unethical prompt-response pairs, the metric correctly classifies each pair as highly unsafe, assigning the highest score. However, when these same prompts and responses are concatenated, the metric's decision flips, assigning the lowest possible score, thereby misclassifying the content as safe and allowing it to bypass the filter. In this study, we discovered that several harmfulness LLM-based metrics, including GPT-based, exhibit this decision-flipping phenomenon. Additionally, we found that even an advanced metric like GPT-4o is highly sensitive to input order. Specifically, it tends to classify responses as safe if the safe content appears first, regardless of any harmful content that follows, and vice versa. This work introduces automatic concatenation-based tests to assess the fundamental properties a valid metric should satisfy. We applied these tests in a model safety scenario to assess the reliability of harmfulness detection metrics, uncovering a number of inconsistencies.

Subjects:	Artificial Intelligence (cs.AI)
MSC classes:	68T50
Cite as:	arXiv:2408.12259 [cs.AI]
	(or arXiv:2408.12259v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2408.12259

Submission history

From: Ora Nova Fandina [view email]
[v1] Thu, 22 Aug 2024 09:57:57 UTC (9,913 KB)

Computer Science > Artificial Intelligence

Title:Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators