Communication-Corruption Coupling and Verification in Cooperative Multi-Objective Bandits
- URL: http://arxiv.org/abs/2601.11924v1
- Date: Sat, 17 Jan 2026 06:13:52 GMT
- Title: Communication-Corruption Coupling and Verification in Cooperative Multi-Objective Bandits
- Authors: Ming Shi,
- Abstract summary: We study cooperative multi-armed bandits with vector-valued rewards under adversarial corruption and limited verification.<n>We show that a fixed environment-side budget $$ can translate into an effective corruption level ranging from $$ to $N$.<n>We further establish information-theoretic limits, including an unavoidable additive $()$ penalty and a high-corruption regime $=(NT)$ where sublinear regret is impossible without clean information.
- Score: 2.1772197319352498
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study cooperative stochastic multi-armed bandits with vector-valued rewards under adversarial corruption and limited verification. In each of $T$ rounds, each of $N$ agents selects an arm, the environment generates a clean reward vector, and an adversary perturbs the observed feedback subject to a global corruption budget $Γ$. Performance is measured by team regret under a coordinate-wise nondecreasing, $L$-Lipschitz scalarization $φ$, covering linear, Chebyshev, and smooth monotone utilities. Our main contribution is a communication-corruption coupling: we show that a fixed environment-side budget $Γ$ can translate into an effective corruption level ranging from $Γ$ to $NΓ$, depending on whether agents share raw samples, sufficient statistics, or only arm recommendations. We formalize this via a protocol-induced multiplicity functional and prove regret bounds parameterized by the resulting effective corruption. As corollaries, raw-sample sharing can suffer an $N$-fold larger additive corruption penalty, whereas summary sharing and recommendation-only sharing preserve an unamplified $O(Γ)$ term and achieve centralized-rate team regret. We further establish information-theoretic limits, including an unavoidable additive $Ω(Γ)$ penalty and a high-corruption regime $Γ=Θ(NT)$ where sublinear regret is impossible without clean information. Finally, we characterize how a global budget $ν$ of verified observations restores learnability. That is, verification is necessary in the high-corruption regime, and sufficient once it crosses the identification threshold, with certified sharing enabling the team's regret to become independent of $Γ$.
Related papers
- Online Learning to Rank under Corruption: A Robust Cascading Bandits Approach [15.847551488328866]
Online learning to rank studies how to recommend a short ranked list of items from a large pool and improves future rankings based on user clicks.<n>This setting is commonly modeled as cascading bandits, where the objective is to maximize the likelihood that the user clicks on at least one of the presented items.<n>We propose MSUCB, a robust algorithm that incorporates a novel mean-of-medians estimator, which is applied to bandits with corruption setting.
arXiv Detail & Related papers (2025-11-04T23:39:37Z) - Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions [6.234292942334148]
We propose a multi-agent cooperative learning algorithm that is robust to adversarial corruptions.
As a side-product, our algorithm also improves the state-of-the-art regret bounds when reducing to both the single-agent and homogeneous multi-agent scenarios.
arXiv Detail & Related papers (2024-11-12T20:20:26Z) - Corruption-Robust Offline Reinforcement Learning with General Function
Approximation [60.91257031278004]
We investigate the problem of corruption in offline reinforcement learning (RL) with general function approximation.
Our goal is to find a policy that is robust to such corruption and minimizes the suboptimality gap with respect to the optimal policy for the uncorrupted Markov decision processes (MDPs)
arXiv Detail & Related papers (2023-10-23T04:07:26Z) - CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded
Stochastic Corruption [21.709840199725015]
We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions.
We introduce CRIMED, an algorithm that achieves the exact lower bound on regret for bandits with known variance.
Notably, CRIMED can effectively handle corruptions with $varepsilon$ values as high as $frac12$.
arXiv Detail & Related papers (2023-09-28T16:19:53Z) - Robust Lipschitz Bandits to Adversarial Corruptions [61.85150061213987]
Lipschitz bandit is a variant of bandits that deals with a continuous arm set defined on a metric space.
In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions.
Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary.
arXiv Detail & Related papers (2023-05-29T18:16:59Z) - Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$.
The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit.
We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z) - Soft Diffusion: Score Matching for General Corruptions [84.26037497404195]
We propose a new objective called Soft Score Matching that provably learns the score function for any linear corruption process.
We show that our objective learns the gradient of the likelihood under suitable regularity conditions for the family of corruption processes.
Our method achieves state-of-the-art FID score $1.85$ on CelebA-64, outperforming all previous linear diffusion models.
arXiv Detail & Related papers (2022-09-12T17:45:03Z) - A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian
Process Bandits [118.22458816174144]
We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants.
Our algorithm, GP Robust Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation.
We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T21:19:36Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to
Adversarial Corruptions [10.261123419337316]
We study the problem of bandits with adversarial corruptions in the cooperative multi-agent setting.
In the problem, the rewards are independently sampled from distributions across all agents and rounds, but they may be corrupted by an adversary.
Our goal is to minimize both the overall regret and communication cost across all agents.
arXiv Detail & Related papers (2021-06-08T09:32:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.