Replication-Robust Payoff-Allocation for Machine Learning Data Markets

Han, Dongge; Wooldridge, Michael; Rogers, Alex; Ohrimenko, Olga; Tschiatschek, Sebastian

doi:10.1109/TAI.2022.3195686

Computer Science > Machine Learning

arXiv:2006.14583 (cs)

[Submitted on 25 Jun 2020 (v1), last revised 15 Nov 2022 (this version, v6)]

Title:Replication-Robust Payoff-Allocation for Machine Learning Data Markets

Authors:Dongge Han, Michael Wooldridge, Alex Rogers, Olga Ohrimenko, Sebastian Tschiatschek

View PDF

Abstract:Submodular functions have been a powerful mathematical model for a wide range of real-world applications. Recently, submodular functions are becoming increasingly important in machine learning (ML) for modelling notions such as information and redundancy among entities such as data and features. Among these applications, a key question is payoff allocation, i.e., how to evaluate the importance of each entity towards the collective objective? To this end, classic solution concepts from cooperative game theory offer principled approaches to payoff allocation. However, despite the extensive body of game-theoretic literature, payoff allocation in submodular games are relatively under-researched. In particular, an important notion that arises in the emerging submodular applications is redundancy, which may occur from various sources such as abundant data or malicious manipulations where a player replicates its resource and act under multiple identities. Though many game-theoretic solution concepts can be directly used in submodular games, naively applying them for payoff allocation in these settings may incur robustness issues against replication. In this paper, we systematically study the replication manipulation in submodular games and investigate replication robustness, a metric that quantitatively measures the robustness of solution concepts against replication. Using this metric, we present conditions which theoretically characterise the robustness of semivalues, a wide family of solution concepts including the Shapley and Banzhaf value. Moreover, we empirically validate our theoretical results on an emerging submodular ML application, i.e., the ML data market.

Comments:	Published in IEEE Transactions on Artificial Intelligence
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2006.14583 [cs.LG]
	(or arXiv:2006.14583v6 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2006.14583
Related DOI:	https://doi.org/10.1109/TAI.2022.3195686

Submission history

From: Dongge Han [view email]
[v1] Thu, 25 Jun 2020 17:30:12 UTC (5,588 KB)
[v2] Thu, 15 Apr 2021 14:29:08 UTC (4,651 KB)
[v3] Thu, 22 Apr 2021 21:03:42 UTC (4,651 KB)
[v4] Sun, 15 May 2022 15:49:37 UTC (1,861 KB)
[v5] Fri, 26 Aug 2022 15:04:44 UTC (1,861 KB)
[v6] Tue, 15 Nov 2022 22:34:28 UTC (1,861 KB)

Computer Science > Machine Learning

Title:Replication-Robust Payoff-Allocation for Machine Learning Data Markets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Replication-Robust Payoff-Allocation for Machine Learning Data Markets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators