Guided Data Augmentation for Offline Reinforcement Learning and   Imitation Learning
        - URL: http://arxiv.org/abs/2310.18247v3
- Date: Thu, 8 Aug 2024 12:15:18 GMT
- Title: Guided Data Augmentation for Offline Reinforcement Learning and   Imitation Learning
- Authors: Nicholas E. Corrado, Yuxiao Qu, John U. Balis, Adam Labiosa, Josiah P. Hanna, 
- Abstract summary: In offline reinforcement learning (RL), an RL agent learns to solve a task using only a fixed dataset of previously collected data.
We propose Guided Data Augmentation (GuDA), a human-guided DA framework that generates expert-quality augmented data.
GuDA enables learning given a small initial dataset of potentially suboptimal experience.
- Score: 3.586527534935176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   In offline reinforcement learning (RL), an RL agent learns to solve a task using only a fixed dataset of previously collected data. While offline RL has been successful in learning real-world robot control policies, it typically requires large amounts of expert-quality data to learn effective policies that generalize to out-of-distribution states. Unfortunately, such data is often difficult and expensive to acquire in real-world tasks. Several recent works have leveraged data augmentation (DA) to inexpensively generate additional data, but most DA works apply augmentations in a random fashion and ultimately produce highly suboptimal augmented experience. In this work, we propose Guided Data Augmentation (GuDA), a human-guided DA framework that generates expert-quality augmented data. The key insight behind GuDA is that while it may be difficult to demonstrate the sequence of actions required to produce expert data, a user can often easily characterize when an augmented trajectory segment represents progress toward task completion. Thus, a user can restrict the space of possible augmentations to automatically reject suboptimal augmented data. To extract a policy from GuDA, we use off-the-shelf offline reinforcement learning and behavior cloning algorithms. We evaluate GuDA on a physical robot soccer task as well as simulated D4RL navigation tasks, a simulated autonomous driving task, and a simulated soccer task. Empirically, GuDA enables learning given a small initial dataset of potentially suboptimal experience and outperforms a random DA strategy as well as a model-based DA strategy. 
 
      
        Related papers
        - Goal-Conditioned Data Augmentation for Offline Reinforcement Learning [3.5775697416994485]
 We introduce Goal-cOnditioned Data Augmentation (GODA), a goal-conditioned diffusion-based method for augmenting samples with higher quality.
GODA learns a comprehensive distribution representation of the original offline datasets while generating new data with selectively higher-return goals.
We conduct experiments on the D4RL benchmark and real-world challenges, specifically traffic signal control (TSC) tasks, to demonstrate GODA's effectiveness.
 arXiv  Detail & Related papers  (2024-12-29T16:42:30Z)
- RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning [53.8293458872774]
 We propose Reinforcement Learning Distilled Generalists (RLDG) to generate high-quality training data for finetuning generalist policies.
We demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations.
Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems.
 arXiv  Detail & Related papers  (2024-12-13T04:57:55Z)
- Leveraging Skills from Unlabeled Prior Data for Efficient Online   Exploration [54.8229698058649]
 We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
 arXiv  Detail & Related papers  (2024-10-23T17:58:45Z)
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
 We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
 arXiv  Detail & Related papers  (2024-09-12T11:50:06Z)
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
 We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
 arXiv  Detail & Related papers  (2024-08-15T22:27:00Z)
- Learning from Imperfect Demonstrations with Self-Supervision for Robotic   Manipulation [30.791222277450053]
 Current imitation learning (IL) typically discards imperfect data, focusing solely on successful expert data.
We introduce a Self-Supervised Data Filtering framework (SSDF) that combines expert and imperfect data to compute quality scores for failed trajectory segments.
SSDF can accurately expand the training dataset with high-quality imperfect data and improve the success rates for all robotic manipulation tasks.
 arXiv  Detail & Related papers  (2024-01-17T04:15:56Z)
- Offline Robot Reinforcement Learning with Uncertainty-Guided Human
  Expert Sampling [11.751910133386254]
 Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data.
We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data.
Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
 arXiv  Detail & Related papers  (2022-12-16T01:41:59Z)
- Retrieval-Augmented Reinforcement Learning [63.32076191982944]
 We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
 arXiv  Detail & Related papers  (2022-02-17T02:44:05Z)
- Generalization in Reinforcement Learning by Soft Data Augmentation [11.752595047069505]
 SOft Data Augmentation (SODA) is a method that decouples augmentation from policy learning.
We find SODA to significantly advance sample efficiency, generalization, and stability in training over state-of-the-art vision-based RL methods.
 arXiv  Detail & Related papers  (2020-11-26T17:00:34Z)
- Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
 Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
 arXiv  Detail & Related papers  (2020-10-16T18:48:49Z)
- AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
 We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
 arXiv  Detail & Related papers  (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.