A Ratio-Based Shapley Value for Collaborative Machine Learning - Extended Version
- URL: http://arxiv.org/abs/2510.13261v1
- Date: Wed, 15 Oct 2025 08:08:18 GMT
- Title: A Ratio-Based Shapley Value for Collaborative Machine Learning - Extended Version
- Authors: Björn Filter, Ralf Möller, Özgür Lütfü Özçep,
- Abstract summary: Collaborative machine learning enables multiple data owners to jointly train models for improved predictive performance.<n> Ensuring incentive compatibility and fair contribution-based rewards remains a critical challenge.<n>We introduce a ratio-based Shapley value that replaces the standard additive formulation with a relative contribution measure.
- Score: 1.7778609937758325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collaborative machine learning enables multiple data owners to jointly train models for improved predictive performance. However, ensuring incentive compatibility and fair contribution-based rewards remains a critical challenge. Prior work by Sim and colleagues (Rachel Hwee Ling Sim et al: Collaborative machine learning with incentive-aware model rewards. In: International conference on machine learning. PMLR. 2020, pp. 8927-8963) addressed this by allocating model rewards, which are non-monetary and freely replicable, based on the Shapley value of each party's data contribution, measured via information gain. In this paper, we introduce a ratio-based Shapley value that replaces the standard additive formulation with a relative contribution measure. While our overall reward framework, including the incentive definitions and model-reward setting, remains aligned with that of Sim and colleagues, the underlying value function is fundamentally different. Our alternative valuation induces a different distribution of model rewards and offers a new lens through which to analyze incentive properties. We formally define the ratio-based value and prove that it satisfies the same set of incentive conditions as the additive formulation, including adapted versions of fairness, individual rationality, and stability. Like the original approach, our method faces the same fundamental trade-offs between these incentives. Our contribution is a mathematically grounded alternative to the additive Shapley framework, potentially better suited to contexts where proportionality among contributors is more meaningful than additive differences.
Related papers
- Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling [49.41422138354821]
We propose a principled reward modeling framework that integrates non-negative factor analysis into the Bradley-Terry preference model.<n>BNRM represents rewards through a sparse, non-negative latent factor generative process.<n>We show that BNRM substantially mitigates reward over-optimization, improves robustness under distribution shifts, and yields more interpretable reward decompositions than strong baselines.
arXiv Detail & Related papers (2026-02-11T08:14:11Z) - SurrogateSHAP: Training-Free Contributor Attribution for Text-to-Image (T2I) Models [24.06687457570142]
SurrogateSHAP is a retraining-free framework that approximates the expensive retraining game through inference from a pretrained model.<n>We evaluate SurrogateSHAP across three diverse attribution tasks: (i) image quality for DDPM-CFG on CIFAR-20, (ii) aesthetics for Stable Diffusion on Post-Impressionist artworks, and (iii) product diversity for FLUX.1 on Fashion-Product data.
arXiv Detail & Related papers (2026-01-29T19:48:19Z) - Social Comparison without Explicit Inference of Others' Reward Values: A Constructive Approach Using a Probabilistic Generative Model [1.9732490977700972]
Social comparison relies on objective reward differences rather than inferences about subjective states.<n>We train models on a dataset containing a pair of monkeys, their rewards, and stimuli.<n>We evaluate the models' ability to classify subjective values across pre-defined experimental conditions.
arXiv Detail & Related papers (2025-12-21T10:48:40Z) - Confidence as a Reward: Transforming LLMs into Reward Models [54.98336080630691]
Confidence-as-a-Reward (CRew) is a training-free method that utilizes token-level confidence in the model's final answers as a proxy for reward.<n>We show that CRew outperforms existing training-free reward approaches on the MATH500 and RewardMATH benchmarks.<n>We propose CRew-DPO, a training strategy that constructs preference data from confidence scores combined with correctness signals.
arXiv Detail & Related papers (2025-10-15T12:51:47Z) - Evaluating Robustness of Reward Models for Mathematical Reasoning [14.97819343313859]
We introduce a new design for reliable evaluation of reward models, and to validate this, we construct RewardMATH.
We demonstrate that the scores on RewardMATH strongly correlate with the results of optimized policy and effectively estimate reward overoptimization.
arXiv Detail & Related papers (2024-10-02T16:39:58Z) - On the Volatility of Shapley-Based Contribution Metrics in Federated Learning [1.827018440608344]
Federated learning (FL) is a collaborative and privacy-preserving Machine Learning paradigm.<n>Inaccurate allocation of contributions can undermine trust, lead to unfair compensation, and thus participants may lack the incentive to join or actively contribute to the federation.<n>We provide an extensive analysis of the discrepancies of Shapley values across a set of aggregation strategies and examine them on an overall and a per-client level.
arXiv Detail & Related papers (2024-05-13T13:55:34Z) - RewardBench: Evaluating Reward Models for Language Modeling [100.28366840977966]
We present RewardBench, a benchmark dataset and code-base for evaluation of reward models.
The dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety.
On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods.
arXiv Detail & Related papers (2024-03-20T17:49:54Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state.
Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z) - Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL.
Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z) - Collaborative Machine Learning with Incentive-Aware Model Rewards [32.43927226170119]
Collaborative machine learning (ML) is an appealing paradigm to build high-quality ML models by training on the aggregated data from many parties.
These parties are only willing to share their data when given enough incentives, such as a guaranteed fair reward based on their contributions.
This paper proposes to value a party's reward based on Shapley value and information gain on model parameters given its data.
arXiv Detail & Related papers (2020-10-24T06:20:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.