Goal-Conditioned Reinforcement Learning in the Presence of an Adversary
- URL: http://arxiv.org/abs/2211.06929v1
- Date: Sun, 13 Nov 2022 15:40:01 GMT
- Title: Goal-Conditioned Reinforcement Learning in the Presence of an Adversary
- Authors: Carlos Purves, Pietro Li\`o and C\u{a}t\u{a}lina Cangea
- Abstract summary: Reinforcement learning has seen increasing applications in real-world contexts over the past few years.
A common approach to combat this is to train agents in the presence of an adversary.
An adversary acts to destabilise the agent, which learns a more robust policy and can better handle realistic conditions.
We present DigitFlip and CLEVR-Play, two novel goal-conditioned environments that support acting against an adversary.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning has seen increasing applications in real-world
contexts over the past few years. However, physical environments are often
imperfect and policies that perform well in simulation might not achieve the
same performance when applied elsewhere. A common approach to combat this is to
train agents in the presence of an adversary. An adversary acts to destabilise
the agent, which learns a more robust policy and can better handle realistic
conditions. Many real-world applications of reinforcement learning also make
use of goal-conditioning: this is particularly useful in the context of
robotics, as it allows the agent to act differently, depending on which goal is
selected. Here, we focus on the problem of goal-conditioned learning in the
presence of an adversary. We first present DigitFlip and CLEVR-Play, two novel
goal-conditioned environments that support acting against an adversary. Next,
we propose EHER and CHER -- two HER-based algorithms for goal-conditioned
learning -- and evaluate their performance. Finally, we unify the two threads
and introduce IGOAL: a novel framework for goal-conditioned learning in the
presence of an adversary. Experimental results show that combining IGOAL with
EHER allows agents to significantly outperform existing approaches, when acting
against both random and competent adversaries.
Related papers
- Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent.
We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.
Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z) - Safe adaptation in multiagent competition [48.02377041620857]
In multiagent competitive scenarios, ego-agents may have to adapt to new opponents with previously unseen behaviors.
As the ego-agent updates its own behavior to exploit the opponent, its own behavior could become more exploitable.
We develop a safe adaptation approach in which the ego-agent is trained against a regularized opponent model.
arXiv Detail & Related papers (2022-03-14T23:53:59Z) - It Takes Four to Tango: Multiagent Selfplay for Automatic Curriculum
Generation [107.10235120286352]
Training general-purpose reinforcement learning agents efficiently requires automatic generation of a goal curriculum.
We propose Curriculum Self Play (CuSP), an automated goal generation framework.
We demonstrate that our method succeeds at generating an effective curricula of goals for a range of control tasks.
arXiv Detail & Related papers (2022-02-22T01:23:23Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Unsupervised Domain Adaptation with Dynamics-Aware Rewards in
Reinforcement Learning [28.808933152885874]
Unconditioned reinforcement learning aims to acquire skills without prior goal representations.
The intuitive approach of training in another interaction-rich environment disrupts the trained skills in the target environment.
We propose an unsupervised domain adaptation method to identify and acquire skills across dynamics.
arXiv Detail & Related papers (2021-10-25T14:40:48Z) - Targeted Attack on Deep RL-based Autonomous Driving with Learned Visual
Patterns [18.694795507945603]
Recent studies demonstrated the vulnerability of control policies learned through deep reinforcement learning against adversarial attacks.
This paper investigates the feasibility of targeted attacks through visually learned patterns placed on physical object in the environment.
arXiv Detail & Related papers (2021-09-16T04:59:06Z) - Understanding Adversarial Attacks on Observations in Deep Reinforcement
Learning [32.12283927682007]
Deep reinforcement learning models are vulnerable to adversarial attacks which can decrease the victim's total reward by manipulating the observations.
We reformulate the problem of adversarial attacks in function space and separate the previous gradient based attacks into several subspaces.
In the first stage, we train a deceptive policy by hacking the environment, and discover a set of trajectories routing to the lowest reward.
Our method provides a tighter theoretical upper bound for the attacked agent's performance than the existing approaches.
arXiv Detail & Related papers (2021-06-30T07:41:51Z) - Language-guided Navigation via Cross-Modal Grounding and Alternate
Adversarial Learning [66.9937776799536]
The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments.
The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments.
We propose a cross-modal grounding module to equip the agent with a better ability to track the correspondence between the textual and visual modalities.
arXiv Detail & Related papers (2020-11-22T09:13:46Z) - Intrinsic Motivation for Encouraging Synergistic Behavior [55.10275467562764]
We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks.
Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.
arXiv Detail & Related papers (2020-02-12T19:34:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.