Adversarially Guided Actor-Critic
- URL: http://arxiv.org/abs/2102.04376v1
- Date: Mon, 8 Feb 2021 17:31:13 GMT
- Title: Adversarially Guided Actor-Critic
- Authors: Yannis Flet-Berliac and Johan Ferret and Olivier Pietquin and Philippe
Preux and Matthieu Geist
- Abstract summary: This paper introduces a third protagonist: the adversary.
While the adversary mimics the actor by minimizing the KL-divergence between their respective action distributions, the actor, in addition to learning to solve the task, tries to differentiate itself from the adversary predictions.
Our experimental analysis shows that the resulting Adversarially Guided Actor-Critic (AGAC) algorithm leads to more exhaustive exploration.
- Score: 42.76141646708985
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite definite success in deep reinforcement learning problems,
actor-critic algorithms are still confronted with sample inefficiency in
complex environments, particularly in tasks where efficient exploration is a
bottleneck. These methods consider a policy (the actor) and a value function
(the critic) whose respective losses are built using different motivations and
approaches. This paper introduces a third protagonist: the adversary. While the
adversary mimics the actor by minimizing the KL-divergence between their
respective action distributions, the actor, in addition to learning to solve
the task, tries to differentiate itself from the adversary predictions. This
novel objective stimulates the actor to follow strategies that could not have
been correctly predicted from previous trajectories, making its behavior
innovative in tasks where the reward is extremely rare. Our experimental
analysis shows that the resulting Adversarially Guided Actor-Critic (AGAC)
algorithm leads to more exhaustive exploration. Notably, AGAC outperforms
current state-of-the-art methods on a set of various hard-exploration and
procedurally-generated tasks.
Related papers
- Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching [23.600285251963395]
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment.
Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimize the reward through repeated RL procedures.
We propose a novel approach to IRL by direct policy optimization, exploiting a linear factorization of the return as the inner product of successor features and a reward vector.
arXiv Detail & Related papers (2024-11-11T14:05:50Z) - Deep Exploration with PAC-Bayes [12.622116321154113]
Reinforcement learning for continuous control under sparse rewards is an under-explored problem despite its significance in real life.
We address the deep exploration problem for the first time from a PAC-Bayesian perspective in the context of actor-critic learning.
Our proposed algorithm, named PAC-Bayesian Actor-Critic (PBAC), is the only algorithm to successfully discover sparse rewards on a diverse set of continuous control tasks.
arXiv Detail & Related papers (2024-02-05T14:42:45Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Imitation from Observation With Bootstrapped Contrastive Learning [12.048166025000976]
Imitation from observation (IfO) is a learning paradigm that consists of training autonomous agents in a Markov Decision Process.
We present BootIfOL, an IfO algorithm that aims to learn a reward function that takes an agent trajectory and compares it to an expert.
We evaluate our approach on a variety of control tasks showing that we can train effective policies using a limited number of demonstrative trajectories.
arXiv Detail & Related papers (2023-02-13T17:32:17Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy
Behavior Representation for Deep Reinforcement Learning [0.0]
We propose Behavior-Guided Actor-Critic (BAC) as an off-policy actor-critic deep RL algorithm.
BAC mathematically formulates the behavior of the policy through autoencoders.
Results show considerably better performances of BAC when compared to several cutting-edge learning algorithms.
arXiv Detail & Related papers (2021-04-09T15:22:35Z) - Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.