Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses
- URL: http://arxiv.org/abs/2008.09312v8
- Date: Thu, 14 Mar 2024 21:04:43 GMT
- Title: Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses
- Authors: Shiliang Zuo,
- Abstract summary: Two sets of results are presented in this paper.
I design attack strategies against UCB and Thompson Sampling that only spend $widehatO(sqrtlog T)$ cost.
Inspired by literature on smoothed analysis and behavioral economics, I present two simple algorithms that achieve a competitive ratio arbitrarily close to 1.
- Score: 2.296475290901356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: I study adversarial attacks against stochastic bandit algorithms. At each round, the learner chooses an arm, and a stochastic reward is generated. The adversary strategically adds corruption to the reward, and the learner is only able to observe the corrupted reward at each round. Two sets of results are presented in this paper. The first set studies the optimal attack strategies for the adversary. The adversary has a target arm he wishes to promote, and his goal is to manipulate the learner into choosing this target arm $T - o(T)$ times. I design attack strategies against UCB and Thompson Sampling that only spend $\widehat{O}(\sqrt{\log T})$ cost. Matching lower bounds are presented, and the vulnerability of UCB, Thompson sampling, and $\varepsilon$-greedy are exactly characterized. The second set studies how the learner can defend against the adversary. Inspired by literature on smoothed analysis and behavioral economics, I present two simple algorithms that achieve a competitive ratio arbitrarily close to 1.
Related papers
- Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Bandits Meet Mechanism Design to Combat Clickbait in Online
Recommendation [50.469872635246176]
We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit.
This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards.
arXiv Detail & Related papers (2023-11-27T09:19:01Z) - Alternating Objectives Generates Stronger PGD-Based Adversarial Attacks [78.2700757742992]
Projected Gradient Descent (PGD) is one of the most effective and conceptually simple algorithms to generate such adversaries.
We experimentally verify this assertion on a synthetic-data example and by evaluating our proposed method across 25 different $ell_infty$-robust models and 3 datasets.
Our strongest adversarial attack outperforms all of the white-box components of AutoAttack ensemble.
arXiv Detail & Related papers (2022-12-15T17:44:31Z) - Combinatorial Bandits under Strategic Manipulations [25.882801456026584]
We study the problem of multi-armed bandits (CMAB) under strategic manipulations of rewards, where each arm can modify the emitted reward signals for its own interest.
Our setting elaborates a more realistic model of adaptive arms that imposes relaxed assumptions compared to adversarial corruptions and adversarial attacks.
arXiv Detail & Related papers (2021-02-25T07:57:27Z) - Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models.
In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms.
CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Regret Minimization in Stochastic Contextual Dueling Bandits [40.17224226373741]
We consider the problem of $K$-armed dueling bandit in the contextual setting.
We present two algorithms for the setup with respective regret guarantees.
arXiv Detail & Related papers (2020-02-20T06:36:19Z) - Robust Stochastic Bandit Algorithms under Probabilistic Unbounded
Adversarial Attack [41.060507338755784]
This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks.
We propose a novel sample median-based and exploration-aided UCB algorithm (called med-E-UCB) and a median-based $epsilon$-greedy algorithm (called med-$epsilon$-greedy)
Both algorithms are provably robust to the aforementioned attack model. More specifically we show that both algorithms achieve $mathcalO(log T)$ pseudo-regret (i.e
arXiv Detail & Related papers (2020-02-17T19:21:08Z) - Defensive Few-shot Learning [77.82113573388133]
This paper investigates a new challenging problem called defensive few-shot learning.
It aims to learn a robust few-shot model against adversarial attacks.
The proposed framework can effectively make the existing few-shot models robust against adversarial attacks.
arXiv Detail & Related papers (2019-11-16T05:57:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.