On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit
Mechanisms
- URL: http://arxiv.org/abs/2307.07675v1
- Date: Sat, 15 Jul 2023 01:20:31 GMT
- Title: On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit
Mechanisms
- Authors: Yinglun Xu, Bhuvesh Kumar, Jacob Abernethy
- Abstract summary: We show that the most prominent contextual bandit algorithm, $epsilon$-greedy can be extended to handle the challenges introduced by strategic arms.
We also show that $epsilon$-greedy is inherently robust to adversarial data corruption attacks and achieves performance that degrades linearly with the amount of corruption.
- Score: 0.7734726150561088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient learning in multi-armed bandit mechanisms such as pay-per-click
(PPC) auctions typically involves three challenges: 1) inducing truthful
bidding behavior (incentives), 2) using personalization in the users (context),
and 3) circumventing manipulations in click patterns (corruptions). Each of
these challenges has been studied orthogonally in the literature; incentives
have been addressed by a line of work on truthful multi-armed bandit
mechanisms, context has been extensively tackled by contextual bandit
algorithms, while corruptions have been discussed via a recent line of work on
bandits with adversarial corruptions. Since these challenges co-exist, it is
important to understand the robustness of each of these approaches in
addressing the other challenges, provide algorithms that can handle all
simultaneously, and highlight inherent limitations in this combination. In this
work, we show that the most prominent contextual bandit algorithm,
$\epsilon$-greedy can be extended to handle the challenges introduced by
strategic arms in the contextual multi-arm bandit mechanism setting. We further
show that $\epsilon$-greedy is inherently robust to adversarial data corruption
attacks and achieves performance that degrades linearly with the amount of
corruption.
Related papers
- Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting.
We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z) - Multi-granular Adversarial Attacks against Black-box Neural Ranking Models [111.58315434849047]
We create high-quality adversarial examples by incorporating multi-granular perturbations.
We transform the multi-granular attack into a sequential decision-making process.
Our attack method surpasses prevailing baselines in both attack effectiveness and imperceptibility.
arXiv Detail & Related papers (2024-04-02T02:08:29Z) - Few-Shot Adversarial Prompt Learning on Vision-Language Models [62.50622628004134]
The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention.
Previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision.
We propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement.
arXiv Detail & Related papers (2024-03-21T18:28:43Z) - Robust Lipschitz Bandits to Adversarial Corruptions [61.85150061213987]
Lipschitz bandit is a variant of bandits that deals with a continuous arm set defined on a metric space.
In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions.
Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary.
arXiv Detail & Related papers (2023-05-29T18:16:59Z) - AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance
Industry [3.871148938060281]
Insurance claims typically come with a plethora of data from different modalities.
Despite recent advances in multimodal learning, these frameworks still suffer from challenges of joint-training.
We introduce a multimodal reasoning framework, AutoFraudNet, for detecting fraudulent auto-insurance claims.
arXiv Detail & Related papers (2023-01-15T13:50:32Z) - Versatile Dueling Bandits: Best-of-both-World Analyses for Online
Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments.
We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits.
Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z) - On Optimal Robustness to Adversarial Corruption in Online Decision
Problems [27.68461396741871]
We show that optimal robustness can be expressed by a square-root dependency on the amount of corruption.
For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.
arXiv Detail & Related papers (2021-09-22T18:26:45Z) - Differentially-Private Federated Linear Bandits [15.609414012418043]
scFedUCB is a multiagent private algorithm for both centralized and decentralized (peer-to-peer) federated learning.
We provide a rigorous technical analysis of its utility in terms of regret, improving several results in cooperative bandit learning, and provide rigorous privacy guarantees as well.
arXiv Detail & Related papers (2020-10-22T03:58:39Z) - Unifying Clustered and Non-stationary Bandits [50.12992652938055]
Non-stationary bandits and online clustering of bandits lift the restrictive assumptions in contextual bandits.
We propose test of homogeneity, which seamlessly addresses change detection for non-stationary bandits and cluster identification for online clustering of bandit.
Rigorous regret analysis and extensive empirical evaluations demonstrate the value of our proposed solution.
arXiv Detail & Related papers (2020-09-05T04:58:06Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.