Uncertainty-driven Trajectory Truncation for Data Augmentation in
Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2304.04660v2
- Date: Wed, 26 Jul 2023 10:06:06 GMT
- Title: Uncertainty-driven Trajectory Truncation for Data Augmentation in
Offline Reinforcement Learning
- Authors: Junjie Zhang, Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Jun Yang, Le
Wan, Xiu Li
- Abstract summary: Trajectory Truncation with Uncertainty (TATU)
We propose Trajectory Truncation with Uncertainty (TATU), which adaptively truncates the synthetic trajectory if the accumulated uncertainty along the trajectory is too large.
Experimental results on the D4RL benchmark show that TATU significantly improves their performance, often by a large margin.
- Score: 15.697626468632784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Equipped with the trained environmental dynamics, model-based offline
reinforcement learning (RL) algorithms can often successfully learn good
policies from fixed-sized datasets, even some datasets with poor quality.
Unfortunately, however, it can not be guaranteed that the generated samples
from the trained dynamics model are reliable (e.g., some synthetic samples may
lie outside of the support region of the static dataset). To address this
issue, we propose Trajectory Truncation with Uncertainty (TATU), which
adaptively truncates the synthetic trajectory if the accumulated uncertainty
along the trajectory is too large. We theoretically show the performance bound
of TATU to justify its benefits. To empirically show the advantages of TATU, we
first combine it with two classical model-based offline RL algorithms, MOPO and
COMBO. Furthermore, we integrate TATU with several off-the-shelf model-free
offline RL algorithms, e.g., BCQ. Experimental results on the D4RL benchmark
show that TATU significantly improves their performance, often by a large
margin. Code is available here.
Related papers
- Ultra-Resolution Adaptation with Ease [62.56434979517156]
We propose a set of key guidelines for ultra-resolution adaptation termed emphURAE.
We show that tuning minor components of the weight matrices outperforms widely-used low-rank adapters when synthetic data are unavailable.
Experiments validate that URAE achieves comparable 2K-generation performance to state-of-the-art closed-source models like FLUX1.1 [Pro] Ultra with only 3K samples and 2K iterations.
arXiv Detail & Related papers (2025-03-20T16:44:43Z) - Towards Widening The Distillation Bottleneck for Reasoning Models [39.22557129190619]
Distillation--post-training on LRMs-generated data--is a straightforward yet effective method to enhance the reasoning abilities of smaller models.
We found that distilled long CoT data poses learning difficulty for small models and leads to the inheritance of biases.
We propose constructing tree-based CoT data from scratch via Monte Carlo Tree Search.
arXiv Detail & Related papers (2025-03-03T12:17:36Z) - SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets [32.496818080222646]
We propose a new approach to model-based offline reinforcement learning.
We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO.
Experimental results show that our method substantially outperforms all baseline methods.
arXiv Detail & Related papers (2024-06-13T15:16:38Z) - Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Look Beneath the Surface: Exploiting Fundamental Symmetry for
Sample-Efficient Offline RL [29.885978495034703]
offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets.
However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets.
We provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets.
arXiv Detail & Related papers (2023-06-07T07:51:05Z) - Offline Q-Learning on Diverse Multi-Task Data Both Scales And
Generalizes [100.69714600180895]
offline Q-learning algorithms exhibit strong performance that scales with model capacity.
We train a single policy on 40 games with near-human performance using up-to 80 million parameter networks.
Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal.
arXiv Detail & Related papers (2022-11-28T08:56:42Z) - Double Check Your State Before Trusting It: Confidence-Aware
Bidirectional Offline Model-Based Imagination [31.805991958408438]
We propose to augment the offline dataset by using trained bidirectional dynamics models and rollout policies with double check.
Our method, confidence-aware bidirectional offline model-based imagination, generates reliable samples and can be combined with any model-free offline RL method.
arXiv Detail & Related papers (2022-06-16T08:00:44Z) - PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided
Exploration [15.173628100049129]
This work studies a model-based algorithm for both Kernelized Regulators (KNR) and linear Markov Decision Processes (MDPs)
For both models, our algorithm guarantees sample complexity and only uses access to a planning oracle.
Our method can also perform reward-free exploration efficiently.
arXiv Detail & Related papers (2021-07-15T15:49:30Z) - Online and Offline Reinforcement Learning by Planning with a Learned
Model [15.8026041700727]
We describe the Reanalyse algorithm which uses model-based policy and value improvement operators to compute new improved training targets on existing data points.
We show that Reanalyse can also be used to learn entirely from demonstrations without any environment interactions.
We introduce MuZero Unplugged, a single unified algorithm for any data budget, including offline RL.
arXiv Detail & Related papers (2021-04-13T15:36:06Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.