Defense Against Reward Poisoning Attacks in Reinforcement Learning
- URL: http://arxiv.org/abs/2102.05776v1
- Date: Wed, 10 Feb 2021 23:31:53 GMT
- Title: Defense Against Reward Poisoning Attacks in Reinforcement Learning
- Authors: Kiarash Banihashem, Adish Singla, Goran Radanovic
- Abstract summary: We study defense strategies against reward poisoning attacks in reinforcement learning.
We propose an optimization framework for deriving optimal defense policies.
We show that defense policies that are solutions to the proposed optimization problems have provable performance guarantees.
- Score: 29.431349181232203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study defense strategies against reward poisoning attacks in reinforcement
learning. As a threat model, we consider attacks that minimally alter rewards
to make the attacker's target policy uniquely optimal under the poisoned
rewards, with the optimality gap specified by an attack parameter. Our goal is
to design agents that are robust against such attacks in terms of the
worst-case utility w.r.t. the true, unpoisoned, rewards while computing their
policies under the poisoned rewards. We propose an optimization framework for
deriving optimal defense policies, both when the attack parameter is known and
unknown. Moreover, we show that defense policies that are solutions to the
proposed optimization problems have provable performance guarantees. In
particular, we provide the following bounds with respect to the true,
unpoisoned, rewards: a) lower bounds on the expected return of the defense
policies, and b) upper bounds on how suboptimal these defense policies are
compared to the attacker's target policy. We conclude the paper by illustrating
the intuitions behind our formal results, and showing that the derived bounds
are non-trivial.
Related papers
- Optimal Attack and Defense for Reinforcement Learning [11.36770403327493]
In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment.
We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward.
We argue that the optimal defense policy for the victim can be computed as the solution to a Stackelberg game.
arXiv Detail & Related papers (2023-11-30T21:21:47Z) - IDEA: Invariant Defense for Graph Adversarial Robustness [60.0126873387533]
We propose an Invariant causal DEfense method against adversarial Attacks (IDEA)
We derive node-based and structure-based invariance objectives from an information-theoretic perspective.
Experiments demonstrate that IDEA attains state-of-the-art defense performance under all five attacks on all five datasets.
arXiv Detail & Related papers (2023-05-25T07:16:00Z) - Planning for Attacker Entrapment in Adversarial Settings [16.085007590604327]
We propose a framework to generate a defense strategy against an attacker who is working in an environment where a defender can operate without the attacker's knowledge.
Our problem formulation allows us to capture it as a much simpler infinite horizon discounted MDP, in which the optimal policy for the MDP gives the defender's strategy against the actions of the attacker.
arXiv Detail & Related papers (2023-03-01T21:08:27Z) - Implicit Poisoning Attacks in Two-Agent Reinforcement Learning:
Adversarial Policies for Training-Time Attacks [21.97069271045167]
In targeted poisoning attacks, an attacker manipulates an agent-environment interaction to force the agent into adopting a policy of interest, called target policy.
We study targeted poisoning attacks in a two-agent setting where an attacker implicitly poisons the effective environment of one of the agents by modifying the policy of its peer.
We develop an optimization framework for designing optimal attacks, where the cost of the attack measures how much the solution deviates from the assumed default policy of the peer agent.
arXiv Detail & Related papers (2023-02-27T14:52:15Z) - Randomness in ML Defenses Helps Persistent Attackers and Hinders
Evaluators [49.52538232104449]
It is becoming increasingly imperative to design robust ML defenses.
Recent work has found that many defenses that initially resist state-of-the-art attacks can be broken by an adaptive adversary.
We take steps to simplify the design of defenses and argue that white-box defenses should eschew randomness when possible.
arXiv Detail & Related papers (2023-02-27T01:33:31Z) - Attacking and Defending Deep Reinforcement Learning Policies [3.6985039575807246]
We study robustness of DRL policies to adversarial attacks from the perspective of robust optimization.
We propose a greedy attack algorithm, which tries to minimize the expected return of the policy without interacting with the environment, and a defense algorithm, which performs adversarial training in a max-min form.
arXiv Detail & Related papers (2022-05-16T12:47:54Z) - Projective Ranking-based GNN Evasion Attacks [52.85890533994233]
Graph neural networks (GNNs) offer promising learning methods for graph-related tasks.
GNNs are at risk of adversarial attacks.
arXiv Detail & Related papers (2022-02-25T21:52:09Z) - Adversarial Attack and Defense in Deep Ranking [100.17641539999055]
We propose two attacks against deep ranking systems that can raise or lower the rank of chosen candidates by adversarial perturbations.
Conversely, an anti-collapse triplet defense is proposed to improve the ranking model robustness against all proposed attacks.
Our adversarial ranking attacks and defenses are evaluated on MNIST, Fashion-MNIST, CUB200-2011, CARS196 and Stanford Online Products datasets.
arXiv Detail & Related papers (2021-06-07T13:41:45Z) - Guided Adversarial Attack for Evaluating and Enhancing Adversarial
Defenses [59.58128343334556]
We introduce a relaxation term to the standard loss, that finds more suitable gradient-directions, increases attack efficacy and leads to more efficient adversarial training.
We propose Guided Adversarial Margin Attack (GAMA), which utilizes function mapping of the clean image to guide the generation of adversaries.
We also propose Guided Adversarial Training (GAT), which achieves state-of-the-art performance amongst single-step defenses.
arXiv Detail & Related papers (2020-11-30T16:39:39Z) - Are Adversarial Examples Created Equal? A Learnable Weighted Minimax
Risk for Robustness under Non-uniform Attacks [70.11599738647963]
Adversarial Training is one of the few defenses that withstand strong attacks.
Traditional defense mechanisms assume a uniform attack over the examples according to the underlying data distribution.
We present a weighted minimax risk optimization that defends against non-uniform attacks.
arXiv Detail & Related papers (2020-10-24T21:20:35Z) - Policy Teaching via Environment Poisoning: Training-time Adversarial
Attacks against Reinforcement Learning [33.41280432984183]
We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy.
As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings.
arXiv Detail & Related papers (2020-03-28T23:22:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.