Model-based trajectory stitching for improved behavioural cloning and
its applications
- URL: http://arxiv.org/abs/2212.04280v1
- Date: Thu, 8 Dec 2022 14:18:04 GMT
- Title: Model-based trajectory stitching for improved behavioural cloning and
its applications
- Authors: Charles A. Hepburn and Giovanni Montana
- Abstract summary: Trajectory Stitching (TS) generates new trajectories by stitching' pairs of states that were disconnected in the original data.
We demonstrate that the iterative process of replacing old trajectories with new ones incrementally improves the underlying behavioural policy.
- Score: 7.462336024223669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Behavioural cloning (BC) is a commonly used imitation learning method to
infer a sequential decision-making policy from expert demonstrations. However,
when the quality of the data is not optimal, the resulting behavioural policy
also performs sub-optimally once deployed. Recently, there has been a surge in
offline reinforcement learning methods that hold the promise to extract
high-quality policies from sub-optimal historical data. A common approach is to
perform regularisation during training, encouraging updates during policy
evaluation and/or policy improvement to stay close to the underlying data. In
this work, we investigate whether an offline approach to improving the quality
of the existing data can lead to improved behavioural policies without any
changes in the BC algorithm. The proposed data improvement approach -
Trajectory Stitching (TS) - generates new trajectories (sequences of states and
actions) by `stitching' pairs of states that were disconnected in the original
data and generating their connecting new action. By construction, these new
transitions are guaranteed to be highly plausible according to probabilistic
models of the environment, and to improve a state-value function. We
demonstrate that the iterative process of replacing old trajectories with new
ones incrementally improves the underlying behavioural policy. Extensive
experimental results show that significant performance gains can be achieved
using TS over BC policies extracted from the original data. Furthermore, using
the D4RL benchmarking suite, we demonstrate that state-of-the-art results are
obtained by combining TS with two existing offline learning methodologies
reliant on BC, model-based offline planning (MBOP) and policy constraint
(TD3+BC).
Related papers
- Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Offline Reinforcement Learning with Closed-Form Policy Improvement
Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.
In this paper, we propose our closed-form policy improvement operators.
We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z) - Model-based Trajectory Stitching for Improved Offline Reinforcement
Learning [7.462336024223669]
We propose a model-based data augmentation strategy, Trajectory Stitching (TS), to improve the quality of sub-optimal historical trajectories.
TS introduces unseen actions joining previously disconnected states.
We show that using this data augmentation strategy jointly with behavioural cloning (BC) leads to improvements over the behaviour-cloned policy.
arXiv Detail & Related papers (2022-11-21T16:00:39Z) - Offline Reinforcement Learning with Adaptive Behavior Regularization [1.491109220586182]
offline reinforcement learning (RL) defines a sample-efficient learning paradigm, where a policy is learned from static and previously collected datasets.
We propose a novel approach, which we refer to as adaptive behavior regularization (ABR)
ABR enables the policy to adaptively adjust its optimization objective between cloning and improving over the policy used to generate the dataset.
arXiv Detail & Related papers (2022-11-15T15:59:11Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.