Adversarial Attacks on Linear Contextual Bandits
- URL: http://arxiv.org/abs/2002.03839v3
- Date: Fri, 23 Oct 2020 08:18:10 GMT
- Title: Adversarial Attacks on Linear Contextual Bandits
- Authors: Evrard Garcelon, Baptiste Roziere, Laurent Meunier, Jean Tarbouriech,
Olivier Teytaud, Alessandro Lazaric, Matteo Pirotta
- Abstract summary: Malicious agents may have incentives to attack the bandit algorithm to induce it to perform a desired behavior.
We show that a malicious agent can force a linear contextual bandit algorithm to pull any desired arm $T - o(T)$ times over a horizon of $T$ steps.
We also investigate the case when a malicious agent is interested in affecting the behavior of the bandit algorithm in a single context.
- Score: 87.08004581867537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contextual bandit algorithms are applied in a wide range of domains, from
advertising to recommender systems, from clinical trials to education. In many
of these domains, malicious agents may have incentives to attack the bandit
algorithm to induce it to perform a desired behavior. For instance, an
unscrupulous ad publisher may try to increase their own revenue at the expense
of the advertisers; a seller may want to increase the exposure of their
products, or thwart a competitor's advertising campaign. In this paper, we
study several attack scenarios and show that a malicious agent can force a
linear contextual bandit algorithm to pull any desired arm $T - o(T)$ times
over a horizon of $T$ steps, while applying adversarial modifications to either
rewards or contexts that only grow logarithmically as $O(\log T)$. We also
investigate the case when a malicious agent is interested in affecting the
behavior of the bandit algorithm in a single context (e.g., a specific user).
We first provide sufficient conditions for the feasibility of the attack and we
then propose an efficient algorithm to perform the attack. We validate our
theoretical results on experiments performed on both synthetic and real-world
datasets.
Related papers
- Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Adversarial Attacks on Adversarial Bandits [10.891819703383408]
We show that the attacker is able to mislead any no-regret adversarial bandit algorithm into selecting a suboptimal target arm.
This result implies critical security concern in real-world bandit-based systems.
arXiv Detail & Related papers (2023-01-30T00:51:39Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Efficient Action Poisoning Attacks on Linear Contextual Bandits [41.1063033715314]
We propose a new class of attacks: action poisoning attacks.
An adversary can change the action signal selected by the agent.
We show that, in both white-box and black-box settings, the proposed attack schemes can force the LinUCB agent to pull a target arm very frequently.
arXiv Detail & Related papers (2021-12-10T07:39:07Z) - When Are Linear Stochastic Bandits Attackable? [47.25702824488642]
This paper studies the attackability of a $k$-armed linear bandit environment.
We propose a two-stage attack method against LinUCB and Robust Phase Elimination.
arXiv Detail & Related papers (2021-10-18T04:12:09Z) - Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks [81.13338949407205]
Recent works show that optimal bandit algorithms are vulnerable to adversarial attacks and can fail completely in the presence of attacks.
Existing robust bandit algorithms only work for the non-contextual setting under the attack of rewards.
We provide the first robust bandit algorithm for linear contextual bandit setting under a fully adaptive and omniscient attack.
arXiv Detail & Related papers (2021-06-05T22:20:34Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Action-Manipulation Attacks Against Stochastic Bandits: Attacks and
Defense [45.408568528354216]
We introduce a new class of attack named action-manipulation attack.
In this attack, an adversary can change the action signal selected by the user.
To defend against this class of attacks, we introduce a novel algorithm that is robust to action-manipulation attacks.
arXiv Detail & Related papers (2020-02-19T04:09:15Z) - Robust Stochastic Bandit Algorithms under Probabilistic Unbounded
Adversarial Attack [41.060507338755784]
This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks.
We propose a novel sample median-based and exploration-aided UCB algorithm (called med-E-UCB) and a median-based $epsilon$-greedy algorithm (called med-$epsilon$-greedy)
Both algorithms are provably robust to the aforementioned attack model. More specifically we show that both algorithms achieve $mathcalO(log T)$ pseudo-regret (i.e
arXiv Detail & Related papers (2020-02-17T19:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.