Related papers: On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit Mechanisms

On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit Mechanisms

URL: http://arxiv.org/abs/2307.07675v1
Date: Sat, 15 Jul 2023 01:20:31 GMT
Title: On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit Mechanisms
Authors: Yinglun Xu, Bhuvesh Kumar, Jacob Abernethy
Abstract summary: We show that the most prominent contextual bandit algorithm, $epsilon$-greedy can be extended to handle the challenges introduced by strategic arms. We also show that $epsilon$-greedy is inherently robust to adversarial data corruption attacks and achieves performance that degrades linearly with the amount of corruption.
Score: 0.7734726150561088
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficient learning in multi-armed bandit mechanisms such as pay-per-click (PPC) auctions typically involves three challenges: 1) inducing truthful bidding behavior (incentives), 2) using personalization in the users (context), and 3) circumventing manipulations in click patterns (corruptions). Each of these challenges has been studied orthogonally in the literature; incentives have been addressed by a line of work on truthful multi-armed bandit mechanisms, context has been extensively tackled by contextual bandit algorithms, while corruptions have been discussed via a recent line of work on bandits with adversarial corruptions. Since these challenges co-exist, it is important to understand the robustness of each of these approaches in addressing the other challenges, provide algorithms that can handle all simultaneously, and highlight inherent limitations in this combination. In this work, we show that the most prominent contextual bandit algorithm, $\epsilon$-greedy can be extended to handle the challenges introduced by strategic arms in the contextual multi-arm bandit mechanism setting. We further show that $\epsilon$-greedy is inherently robust to adversarial data corruption attacks and achieves performance that degrades linearly with the amount of corruption.

Related papers

Influential Bandits: Pulling an Arm May Change the Environment [44.71145269686588]
Real-world applications often involve non-stationary environments and interdependencies between arms. We propose the influential bandit problem, which models inter-arm interactions through an unknown, symmetric, positive semi-definite interaction matrix. We introduce a new algorithm based on a lower confidence bound (LCB) estimator tailored to the structure of the loss dynamics.
arXiv Detail & Related papers (2025-04-11T02:05:51Z)
Multi-Player Approaches for Dueling Bandits [58.442742345319225]
We show that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting. We also analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol.
arXiv Detail & Related papers (2024-05-25T10:25:48Z)
Multi-granular Adversarial Attacks against Black-box Neural Ranking Models [111.58315434849047]
We create high-quality adversarial examples by incorporating multi-granular perturbations. We transform the multi-granular attack into a sequential decision-making process. Our attack method surpasses prevailing baselines in both attack effectiveness and imperceptibility.
arXiv Detail & Related papers (2024-04-02T02:08:29Z)
Few-Shot Adversarial Prompt Learning on Vision-Language Models [62.50622628004134]
The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention. Previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision. We propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement.
arXiv Detail & Related papers (2024-03-21T18:28:43Z)
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
We introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously. We show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
arXiv Detail & Related papers (2023-12-13T06:54:49Z)
Robust Lipschitz Bandits to Adversarial Corruptions [61.85150061213987]
Lipschitz bandit is a variant of bandits that deals with a continuous arm set defined on a metric space. In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions. Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary.
arXiv Detail & Related papers (2023-05-29T18:16:59Z)
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance Industry [3.871148938060281]
Insurance claims typically come with a plethora of data from different modalities. Despite recent advances in multimodal learning, these frameworks still suffer from challenges of joint-training. We introduce a multimodal reasoning framework, AutoFraudNet, for detecting fraudulent auto-insurance claims.
arXiv Detail & Related papers (2023-01-15T13:50:32Z)
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments. We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits. Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z)
On Optimal Robustness to Adversarial Corruption in Online Decision Problems [27.68461396741871]
We show that optimal robustness can be expressed by a square-root dependency on the amount of corruption. For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.
arXiv Detail & Related papers (2021-09-22T18:26:45Z)
Differentially-Private Federated Linear Bandits [15.609414012418043]
scFedUCB is a multiagent private algorithm for both centralized and decentralized (peer-to-peer) federated learning. We provide a rigorous technical analysis of its utility in terms of regret, improving several results in cooperative bandit learning, and provide rigorous privacy guarantees as well.
arXiv Detail & Related papers (2020-10-22T03:58:39Z)
Unifying Clustered and Non-stationary Bandits [50.12992652938055]
Non-stationary bandits and online clustering of bandits lift the restrictive assumptions in contextual bandits. We propose test of homogeneity, which seamlessly addresses change detection for non-stationary bandits and cluster identification for online clustering of bandit. Rigorous regret analysis and extensive empirical evaluations demonstrate the value of our proposed solution.
arXiv Detail & Related papers (2020-09-05T04:58:06Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.