Inception: Efficiently Computable Misinformation Attacks on Markov Games
- URL: http://arxiv.org/abs/2406.17114v1
- Date: Mon, 24 Jun 2024 20:01:43 GMT
- Title: Inception: Efficiently Computable Misinformation Attacks on Markov Games
- Authors: Jeremy McMahan, Young Wu, Yudong Chen, Xiaojin Zhu, Qiaomin Xie,
- Abstract summary: We study security threats to Markov games due to information asymmetry and misinformation.
We derive the victim's policy under worst-case rationality and present-time algorithms to compute the attacker's optimal worst-case policy.
Our work exposes a security vulnerability from standard game assumptions under misinformation.
- Score: 14.491458698581038
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study security threats to Markov games due to information asymmetry and misinformation. We consider an attacker player who can spread misinformation about its reward function to influence the robust victim player's behavior. Given a fixed fake reward function, we derive the victim's policy under worst-case rationality and present polynomial-time algorithms to compute the attacker's optimal worst-case policy based on linear programming and backward induction. Then, we provide an efficient inception ("planting an idea in someone's mind") attack algorithm to find the optimal fake reward function within a restricted set of reward functions with dominant strategies. Importantly, our methods exploit the universal assumption of rationality to compute attacks efficiently. Thus, our work exposes a security vulnerability arising from standard game assumptions under misinformation.
Related papers
- REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Optimal Attack and Defense for Reinforcement Learning [11.36770403327493]
In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment.
We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward.
We argue that the optimal defense policy for the victim can be computed as the solution to a Stackelberg game.
arXiv Detail & Related papers (2023-11-30T21:21:47Z) - A Tale of HodgeRank and Spectral Method: Target Attack Against Rank
Aggregation Is the Fixed Point of Adversarial Game [153.74942025516853]
The intrinsic vulnerability of the rank aggregation methods is not well studied in the literature.
In this paper, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data.
The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments.
arXiv Detail & Related papers (2022-09-13T05:59:02Z) - On Almost-Sure Intention Deception Planning that Exploits Imperfect
Observers [24.11353445650682]
Intention deception involves computing a strategy which deceives the opponent into a wrong belief about the agent's intention or objective.
This paper studies a class of probabilistic planning problems with intention deception and investigates how a defender's limited sensing modality can be exploited.
arXiv Detail & Related papers (2022-09-01T16:38:03Z) - Understanding the Limits of Poisoning Attacks in Episodic Reinforcement
Learning [36.30086280732181]
This paper studies poisoning attacks to manipulate emphany order-optimal learning algorithm towards a targeted policy in episodic RL.
We find that the effect of attacks crucially depend on whether the rewards are bounded or unbounded.
arXiv Detail & Related papers (2022-08-29T15:10:14Z) - Adversarial Attacks on Gaussian Process Bandits [47.84198626686564]
We propose various adversarial attack methods with differing assumptions on the attacker's strength and prior information.
Our goal is to understand adversarial attacks on GP bandits from both a theoretical and practical perspective.
We demonstrate that adversarial attacks on GP bandits can succeed in forcing the algorithm towards $mathcalR_rm target$ even with a low attack budget.
arXiv Detail & Related papers (2021-10-16T02:39:10Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks [81.13338949407205]
Recent works show that optimal bandit algorithms are vulnerable to adversarial attacks and can fail completely in the presence of attacks.
Existing robust bandit algorithms only work for the non-contextual setting under the attack of rewards.
We provide the first robust bandit algorithm for linear contextual bandit setting under a fully adaptive and omniscient attack.
arXiv Detail & Related papers (2021-06-05T22:20:34Z) - The Hammer and the Nut: Is Bilevel Optimization Really Needed to Poison
Linear Classifiers? [27.701693158702753]
Data poisoning is a particularly worrisome subset of poisoning attacks.
We propose a counter-intuitive but efficient framework to combat data poisoning.
Our framework achieves comparable, or even better, performances in terms of the attacker's objective.
arXiv Detail & Related papers (2021-03-23T09:08:10Z) - Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z) - Defense Against Reward Poisoning Attacks in Reinforcement Learning [29.431349181232203]
We study defense strategies against reward poisoning attacks in reinforcement learning.
We propose an optimization framework for deriving optimal defense policies.
We show that defense policies that are solutions to the proposed optimization problems have provable performance guarantees.
arXiv Detail & Related papers (2021-02-10T23:31:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.