ARC -- Actor Residual Critic for Adversarial Imitation Learning
- URL: http://arxiv.org/abs/2206.02095v1
- Date: Sun, 5 Jun 2022 04:49:58 GMT
- Title: ARC -- Actor Residual Critic for Adversarial Imitation Learning
- Authors: Ankur Deka, Changliu Liu, Katia Sycara
- Abstract summary: We show that ARC aided AIL outperforms standard AIL in simulated continuous-control and real robotic manipulation tasks.
ARC algorithms are simple to implement and can be incorporated into any existing AIL implementation with an AC algorithm.
- Score: 3.4806267677524896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial Imitation Learning (AIL) is a class of popular state-of-the-art
Imitation Learning algorithms where an artificial adversary's misclassification
is used as a reward signal and is optimized by any standard Reinforcement
Learning (RL) algorithm. Unlike most RL settings, the reward in AIL is
differentiable but model-free RL algorithms do not make use of this property to
train a policy. In contrast, we leverage the differentiability property of the
AIL reward function and formulate a class of Actor Residual Critic (ARC) RL
algorithms that draw a parallel to the standard Actor-Critic (AC) algorithms in
RL literature and uses a residual critic, C function (instead of the standard Q
function) to approximate only the discounted future return (excluding the
immediate reward). ARC algorithms have similar convergence properties as the
standard AC algorithms with the additional advantage that the gradient through
the immediate reward is exact. For the discrete (tabular) case with finite
states, actions, and known dynamics, we prove that policy iteration with $C$
function converges to an optimal policy. In the continuous case with function
approximation and unknown dynamics, we experimentally show that ARC aided AIL
outperforms standard AIL in simulated continuous-control and real robotic
manipulation tasks. ARC algorithms are simple to implement and can be
incorporated into any existing AIL implementation with an AC algorithm.
Related papers
- Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching [23.600285251963395]
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment.
Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimize the reward through repeated RL procedures.
We propose a novel approach to IRL by direct policy optimization, exploiting a linear factorization of the return as the inner product of successor features and a reward vector.
arXiv Detail & Related papers (2024-11-11T14:05:50Z) - A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence [7.586600116278698]
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN)
arXiv Detail & Related papers (2023-06-10T10:04:54Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Recursive Least Squares Advantage Actor-Critic Algorithms [20.792917267835247]
We propose two novel RLS-based advantage actor critic (A2C) algorithms.
RLSSA2C and RLSNA2C, use the RLS method to train the critic network and the hidden layers of the actor network.
From the experimental results, it is shown that our both algorithms have better sample efficiency than the vanilla A2C on most games or tasks.
arXiv Detail & Related papers (2022-01-15T20:00:26Z) - Provable Benefits of Actor-Critic Methods for Offline Reinforcement
Learning [85.50033812217254]
Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically.
We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle.
arXiv Detail & Related papers (2021-08-19T17:27:29Z) - Implicitly Regularized RL with Implicit Q-Values [42.87920755961722]
The $Q$-function is a central quantity in many Reinforcement Learning (RL) algorithms for which RL agents behave following a (soft)-greedy policy.
We propose to parametrize the $Q$-function implicitly, as the sum of a log-policy and of a value function.
We derive a practical off-policy deep RL algorithm, suitable for large action spaces, and that enforces the softmax relation between the policy and the $Q$-value.
arXiv Detail & Related papers (2021-08-16T12:20:47Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.