Adversarially Trained Actor Critic for Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2202.02446v1
- Date: Sat, 5 Feb 2022 01:02:46 GMT
- Title: Adversarially Trained Actor Critic for Offline Reinforcement Learning
- Authors: Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal
- Abstract summary: ATAC is a new model-free algorithm for offline reinforcement learning under insufficient data coverage.
In the D4RL benchmark, ATAC consistently outperforms state-of-the-art offline RL algorithms on a range of continuous control tasks.
- Score: 42.42451519801851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose Adversarially Trained Actor Critic (ATAC), a new model-free
algorithm for offline reinforcement learning under insufficient data coverage,
based on a two-player Stackelberg game framing of offline RL: A policy actor
competes against an adversarially trained value critic, who finds
data-consistent scenarios where the actor is inferior to the data-collection
behavior policy. We prove that, when the actor attains no regret in the
two-player game, running ATAC produces a policy that provably 1) outperforms
the behavior policy over a wide range of hyperparameters, and 2) competes with
the best policy covered by data with appropriately chosen hyperparameters.
Compared with existing works, notably our framework offers both theoretical
guarantees for general function approximation and a deep RL implementation
scalable to complex environments and large datasets. In the D4RL benchmark,
ATAC consistently outperforms state-of-the-art offline RL algorithms on a range
of continuous control tasks
Related papers
- Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL [42.57662196581823]
Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks.
Most existing off-policy RL algorithms fail to maximally exploit the information in the replay buffer.
We present Offline-Boosted Actor-Critic (OBAC), a model-free online RL framework that elegantly identifies the outperforming offline policy.
arXiv Detail & Related papers (2024-05-28T18:38:46Z) - Offline Reinforcement Learning with Behavioral Supervisor Tuning [0.0]
We present TD3-BST, an algorithm that trains an uncertainty model and uses it to guide the policy to select actions within the dataset support.
TD3-BST can learn more effective policies from offline datasets compared to previous methods and achieves the best performance across challenging benchmarks without requiring per-dataset tuning.
arXiv Detail & Related papers (2024-04-25T08:22:47Z) - Adversarially Trained Actor Critic for offline CMDPs [10.861449694255137]
We propose a Safe Adversarial Trained Actor Critic (SATAC) algorithm for offline reinforcement learning (RL)
Our framework provides both theoretical guarantees and a robust deep-RL implementation.
We demonstrate that SATAC can produce a policy that outperforms the behavior policy while maintaining the same level of safety.
arXiv Detail & Related papers (2024-01-01T01:44:58Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment.
The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data.
We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z) - POPO: Pessimistic Offline Policy Optimization [6.122342691982727]
We study why off-policy RL methods fail to learn in offline setting from the value function view.
We propose Pessimistic Offline Policy Optimization (POPO), which learns a pessimistic value function to get a strong policy.
We find that POPO performs surprisingly well and scales to tasks with high-dimensional state and action space.
arXiv Detail & Related papers (2020-12-26T06:24:34Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.