Dual RL: Unification and New Methods for Reinforcement and Imitation
Learning
- URL: http://arxiv.org/abs/2302.08560v3
- Date: Fri, 26 Jan 2024 16:58:26 GMT
- Title: Dual RL: Unification and New Methods for Reinforcement and Imitation
Learning
- Authors: Harshit Sikchi, Qinqing Zheng, Amy Zhang, Scott Niekum
- Abstract summary: We first cast several state-of-the-art offline RL and offline imitation learning (IL) algorithms as instances of dual RL approaches with shared structures.
We propose a new discriminator-free method ReCOIL that learns to imitate from arbitrary off-policy data to obtain near-expert performance.
For offline RL, our analysis frames a recent offline RL method XQL in the dual framework, and we further propose a new method f-DVL that provides alternative choices to the Gumbel regression loss.
- Score: 26.59374102005998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of reinforcement learning (RL) is to find a policy that maximizes
the expected cumulative return. It has been shown that this objective can be
represented as an optimization problem of state-action visitation distribution
under linear constraints. The dual problem of this formulation, which we refer
to as dual RL, is unconstrained and easier to optimize. In this work, we first
cast several state-of-the-art offline RL and offline imitation learning (IL)
algorithms as instances of dual RL approaches with shared structures. Such
unification allows us to identify the root cause of the shortcomings of prior
methods. For offline IL, our analysis shows that prior methods are based on a
restrictive coverage assumption that greatly limits their performance in
practice. To fix this limitation, we propose a new discriminator-free method
ReCOIL that learns to imitate from arbitrary off-policy data to obtain
near-expert performance. For offline RL, our analysis frames a recent offline
RL method XQL in the dual framework, and we further propose a new method f-DVL
that provides alternative choices to the Gumbel regression loss that fixes the
known training instability issue of XQL. The performance improvements by both
of our proposed methods, ReCOIL and f-DVL, in IL and RL are validated on an
extensive suite of simulated robot locomotion and manipulation tasks. Project
code and details can be found at this https://hari-sikchi.github.io/dual-rl.
Related papers
- More Benefits of Being Distributional: Second-Order Bounds for
Reinforcement Learning [58.626683114119906]
We show that Distributional Reinforcement Learning (DistRL) can obtain second-order bounds in both online and offline RL.
Our results are the first second-order bounds for low-rank MDPs and for offline RL.
arXiv Detail & Related papers (2024-02-11T13:25:53Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Decoupled Prioritized Resampling for Offline RL [120.49021589395005]
We propose Offline Prioritized Experience Replay (OPER) for offline reinforcement learning.
OPER features a class of priority functions designed to prioritize highly-rewarding transitions, making them more frequently visited during training.
We show that this class of priority functions induce an improved behavior policy, and when constrained to this improved policy, a policy-constrained offline RL algorithm is likely to yield a better solution.
arXiv Detail & Related papers (2023-06-08T17:56:46Z) - Efficient Diffusion Policies for Offline Reinforcement Learning [85.73757789282212]
Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model.
We propose efficient diffusion policy (EDP) to overcome these two challenges.
EDP constructs actions from corrupted ones at training to avoid running the sampling chain.
arXiv Detail & Related papers (2023-05-31T17:55:21Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Boosting Offline Reinforcement Learning with Residual Generative
Modeling [27.50950972741753]
offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration.
We show that our method can learn more accurate policy approximations in different benchmark datasets.
In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game Honor of Kings.
arXiv Detail & Related papers (2021-06-19T03:41:14Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.