Related papers: Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses

Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses

URL: http://arxiv.org/abs/2008.09312v8
Date: Thu, 14 Mar 2024 21:04:43 GMT
Title: Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses
Authors: Shiliang Zuo,
Abstract summary: Two sets of results are presented in this paper. I design attack strategies against UCB and Thompson Sampling that only spend $widehatO(sqrtlog T)$ cost. Inspired by literature on smoothed analysis and behavioral economics, I present two simple algorithms that achieve a competitive ratio arbitrarily close to 1.
Score: 2.296475290901356
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: I study adversarial attacks against stochastic bandit algorithms. At each round, the learner chooses an arm, and a stochastic reward is generated. The adversary strategically adds corruption to the reward, and the learner is only able to observe the corrupted reward at each round. Two sets of results are presented in this paper. The first set studies the optimal attack strategies for the adversary. The adversary has a target arm he wishes to promote, and his goal is to manipulate the learner into choosing this target arm $T - o(T)$ times. I design attack strategies against UCB and Thompson Sampling that only spend $\widehat{O}(\sqrt{\log T})$ cost. Matching lower bounds are presented, and the vulnerability of UCB, Thompson sampling, and $\varepsilon$-greedy are exactly characterized. The second set studies how the learner can defend against the adversary. Inspired by literature on smoothed analysis and behavioral economics, I present two simple algorithms that achieve a competitive ratio arbitrarily close to 1.

Related papers

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation [50.469872635246176]
We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit. This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards.
arXiv Detail & Related papers (2023-11-27T09:19:01Z)
Robust Lipschitz Bandits to Adversarial Corruptions [61.85150061213987]
Lipschitz bandit is a variant of bandits that deals with a continuous arm set defined on a metric space. In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions. Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary.
arXiv Detail & Related papers (2023-05-29T18:16:59Z)
Alternating Objectives Generates Stronger PGD-Based Adversarial Attacks [78.2700757742992]
Projected Gradient Descent (PGD) is one of the most effective and conceptually simple algorithms to generate such adversaries. We experimentally verify this assertion on a synthetic-data example and by evaluating our proposed method across 25 different $ell_infty$-robust models and 3 datasets. Our strongest adversarial attack outperforms all of the white-box components of AutoAttack ensemble.
arXiv Detail & Related papers (2022-12-15T17:44:31Z)
Online Learning in Budget-Constrained Dynamic Colonel Blotto Games [2.132096006921048]
We study the strategic allocation of limited resources using a Colonel Blotto game (CBG) under a dynamic setting. We devise an efficient algorithm that combines a special bandit algorithm for path planning problem and a bandits with knapsack algorithm to cope with the budget constraint.
arXiv Detail & Related papers (2021-03-23T20:52:56Z)
Combinatorial Bandits under Strategic Manipulations [25.882801456026584]
We study the problem of multi-armed bandits (CMAB) under strategic manipulations of rewards, where each arm can modify the emitted reward signals for its own interest. Our setting elaborates a more realistic model of adaptive arms that imposes relaxed assumptions compared to adversarial corruptions and adversarial attacks.
arXiv Detail & Related papers (2021-02-25T07:57:27Z)
Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models. In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms. CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
Regret Minimization in Stochastic Contextual Dueling Bandits [40.17224226373741]
We consider the problem of $K$-armed dueling bandit in the contextual setting. We present two algorithms for the setup with respective regret guarantees.
arXiv Detail & Related papers (2020-02-20T06:36:19Z)
Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack [41.060507338755784]
This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks. We propose a novel sample median-based and exploration-aided UCB algorithm (called med-E-UCB) and a median-based $epsilon$-greedy algorithm (called med-$epsilon$-greedy) Both algorithms are provably robust to the aforementioned attack model. More specifically we show that both algorithms achieve $mathcalO(log T)$ pseudo-regret (i.e
arXiv Detail & Related papers (2020-02-17T19:21:08Z)
Defensive Few-shot Learning [77.82113573388133]
This paper investigates a new challenging problem called defensive few-shot learning. It aims to learn a robust few-shot model against adversarial attacks. The proposed framework can effectively make the existing few-shot models robust against adversarial attacks.
arXiv Detail & Related papers (2019-11-16T05:57:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.