An Efficient Generalizable Framework for Visuomotor Policies via
Control-aware Augmentation and Privilege-guided Distillation
- URL: http://arxiv.org/abs/2401.09258v1
- Date: Wed, 17 Jan 2024 15:05:00 GMT
- Title: An Efficient Generalizable Framework for Visuomotor Policies via
Control-aware Augmentation and Privilege-guided Distillation
- Authors: Yinuo Zhao, Kun Wu, Tianjiao Yi, Zhiyuan Xu, Xiaozhu Ju, Zhengping
Che, Qinru Qiu, Chi Harold Liu, Jian Tang
- Abstract summary: Visuomotor policies learn control mechanisms directly from high-dimensional visual observations.
Data augmentation emerges as a promising method for bridging generalization gaps by enriching data variety.
We propose to improve the generalization ability of visuomotor policies as well as preserve training stability from two aspects.
- Score: 47.61391583947082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visuomotor policies, which learn control mechanisms directly from
high-dimensional visual observations, confront challenges in adapting to new
environments with intricate visual variations. Data augmentation emerges as a
promising method for bridging these generalization gaps by enriching data
variety. However, straightforwardly augmenting the entire observation shall
impose excessive burdens on policy learning and may even result in performance
degradation. In this paper, we propose to improve the generalization ability of
visuomotor policies as well as preserve training stability from two aspects: 1)
We learn a control-aware mask through a self-supervised reconstruction task
with three auxiliary losses and then apply strong augmentation only to those
control-irrelevant regions based on the mask to reduce the generalization gaps.
2) To address training instability issues prevalent in visual reinforcement
learning (RL), we distill the knowledge from a pretrained RL expert processing
low-level environment states, to the student visuomotor policy. The policy is
subsequently deployed to unseen environments without any further finetuning. We
conducted comparison and ablation studies across various benchmarks: the
DMControl Generalization Benchmark (DMC-GB), the enhanced Robot Manipulation
Distraction Benchmark (RMDB), and a specialized long-horizontal drawer-opening
robotic task. The extensive experimental results well demonstrate the
effectiveness of our method, e.g., showing a 17\% improvement over previous
methods in the video-hard setting of DMC-GB.
Related papers
- Salience-Invariant Consistent Policy Learning for Generalization in Visual Reinforcement Learning [0.0]
Generalizing policies to unseen scenarios remains a critical challenge in visual reinforcement learning.
In unseen environments, distracting pixels may lead agents to extract representations containing task-irrelevant information.
We propose the Salience-Invariant Consistent Policy Learning algorithm, an efficient framework for zero-shot generalization.
arXiv Detail & Related papers (2025-02-12T12:00:16Z) - DEAR: Disentangled Environment and Agent Representations for Reinforcement Learning without Reconstruction [4.813546138483559]
Reinforcement Learning (RL) algorithms can learn robotic control tasks from visual observations, but they often require a large amount of data.
In this paper, we explore how the agent's knowledge of its shape can improve the sample efficiency of visual RL methods.
We propose a novel method, Disentangled Environment and Agent Representations, that uses the segmentation mask of the agent as supervision.
arXiv Detail & Related papers (2024-06-30T09:15:21Z) - Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for
Visual Reinforcement Learning [27.205521177841568]
We propose Task-aware Lipschitz Data Augmentation (TLDA) for visual Reinforcement Learning (RL)
TLDA explicitly identifies the task-correlated pixels with large Lipschitz constants, and only augments the task-irrelevant pixels.
It outperforms previous state-of-the-art methods across the 3 different visual control benchmarks.
arXiv Detail & Related papers (2022-02-21T04:22:07Z) - Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under
Data Augmentation [25.493902939111265]
We investigate causes of instability when using data augmentation in off-policy Reinforcement Learning algorithms.
We propose a simple yet effective technique for stabilizing this class of algorithms under augmentation.
Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL.
arXiv Detail & Related papers (2021-07-01T17:58:05Z) - SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual
Policies [87.78260740602674]
Generalization has been a long-standing challenge for reinforcement learning (RL)
In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift.
We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization.
arXiv Detail & Related papers (2021-06-17T17:28:18Z) - Residual Reinforcement Learning from Demonstrations [51.56457466788513]
Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal.
We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations.
Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning.
arXiv Detail & Related papers (2021-06-15T11:16:49Z) - Robust Deep Reinforcement Learning via Multi-View Information Bottleneck [7.188571996124112]
We introduce an auxiliary objective based on the multi-view information bottleneck (MIB) principle.
This encourages learning representations that are both predictive of the future and less sensitive to task-irrelevant distractions.
We demonstrate that our approach can achieve SOTA performance on challenging visual control tasks, even when the background is replaced with natural videos.
arXiv Detail & Related papers (2021-02-26T02:24:36Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.