Disturbing Reinforcement Learning Agents with Corrupted Rewards
- URL: http://arxiv.org/abs/2102.06587v1
- Date: Fri, 12 Feb 2021 15:53:48 GMT
- Title: Disturbing Reinforcement Learning Agents with Corrupted Rewards
- Authors: Rub\'en Majadas, Javier Garc\'ia and Fernando Fern\'andez
- Abstract summary: We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
- Score: 62.997667081978825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) algorithms have led to recent successes in
solving complex games, such as Atari or Starcraft, and to a huge impact in
real-world applications, such as cybersecurity or autonomous driving. In the
side of the drawbacks, recent works have shown how the performance of RL
algorithms decreases under the influence of soft changes in the reward
function. However, little work has been done about how sensitive these
disturbances are depending on the aggressiveness of the attack and the learning
exploration strategy. In this paper, we propose to fill this gap in the
literature analyzing the effects of different attack strategies based on reward
perturbations, and studying the effect in the learner depending on its
exploration strategy. In order to explain all the behaviors, we choose a
sub-class of MDPs: episodic, stochastic goal-only-rewards MDPs, and in
particular, an intelligible grid domain as a benchmark. In this domain, we
demonstrate that smoothly crafting adversarial rewards are able to mislead the
learner, and that using low exploration probability values, the policy learned
is more robust to corrupt rewards. Finally, in the proposed learning scenario,
a counterintuitive result arises: attacking at each learning episode is the
lowest cost attack strategy.
Related papers
- Loss of Plasticity in Continual Deep Reinforcement Learning [14.475963928766134]
We demonstrate that deep RL agents lose their ability to learn good policies when they cycle through a sequence of Atari 2600 games.
We investigate this phenomenon closely at scale and analyze how the weights, gradients, and activations change over time.
Our analysis shows that the activation footprint of the network becomes sparser, contributing to the diminishing gradients.
arXiv Detail & Related papers (2023-03-13T22:37:15Z) - Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning [6.414910263179327]
We study reward poisoning attacks on online deep reinforcement learning (DRL)
We demonstrate the intrinsic vulnerability of state-of-the-art DRL algorithms by designing a general, black-box reward poisoning framework called adversarial MDP attacks.
Our results show that our attacks efficiently poison agents learning in several popular classical control and MuJoCo environments.
arXiv Detail & Related papers (2022-05-30T04:07:19Z) - Projective Ranking-based GNN Evasion Attacks [52.85890533994233]
Graph neural networks (GNNs) offer promising learning methods for graph-related tasks.
GNNs are at risk of adversarial attacks.
arXiv Detail & Related papers (2022-02-25T21:52:09Z) - Execute Order 66: Targeted Data Poisoning for Reinforcement Learning [52.593097204559314]
We introduce an insidious poisoning attack for reinforcement learning which causes agent misbehavior only at specific target states.
We accomplish this by adapting a recent technique, gradient alignment, to reinforcement learning.
We test our method and demonstrate success in two Atari games of varying difficulty.
arXiv Detail & Related papers (2022-01-03T17:09:32Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Understanding Adversarial Attacks on Observations in Deep Reinforcement
Learning [32.12283927682007]
Deep reinforcement learning models are vulnerable to adversarial attacks which can decrease the victim's total reward by manipulating the observations.
We reformulate the problem of adversarial attacks in function space and separate the previous gradient based attacks into several subspaces.
In the first stage, we train a deceptive policy by hacking the environment, and discover a set of trajectories routing to the lowest reward.
Our method provides a tighter theoretical upper bound for the attacked agent's performance than the existing approaches.
arXiv Detail & Related papers (2021-06-30T07:41:51Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Self Punishment and Reward Backfill for Deep Q-Learning [6.572828651397661]
Reinforcement learning agents learn by encouraging behaviours which maximize their total reward, usually provided by the environment.
In many environments, the reward is provided after a series of actions rather than each single action, leading the agent to experience ambiguity in terms of whether those actions are effective.
We propose two strategies inspired by behavioural psychology to enable the agent to intrinsically estimate more informative reward values for actions with no reward.
arXiv Detail & Related papers (2020-04-10T11:53:11Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.