Related papers: An Efficient Generalizable Framework for Visuomotor Policies via Control-aware Augmentation and Privilege-guided Distillation

An Efficient Generalizable Framework for Visuomotor Policies via Control-aware Augmentation and Privilege-guided Distillation

URL: http://arxiv.org/abs/2401.09258v1
Date: Wed, 17 Jan 2024 15:05:00 GMT
Title: An Efficient Generalizable Framework for Visuomotor Policies via Control-aware Augmentation and Privilege-guided Distillation
Authors: Yinuo Zhao, Kun Wu, Tianjiao Yi, Zhiyuan Xu, Xiaozhu Ju, Zhengping Che, Qinru Qiu, Chi Harold Liu, Jian Tang
Abstract summary: Visuomotor policies learn control mechanisms directly from high-dimensional visual observations. Data augmentation emerges as a promising method for bridging generalization gaps by enriching data variety. We propose to improve the generalization ability of visuomotor policies as well as preserve training stability from two aspects.
Score: 47.61391583947082
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visuomotor policies, which learn control mechanisms directly from high-dimensional visual observations, confront challenges in adapting to new environments with intricate visual variations. Data augmentation emerges as a promising method for bridging these generalization gaps by enriching data variety. However, straightforwardly augmenting the entire observation shall impose excessive burdens on policy learning and may even result in performance degradation. In this paper, we propose to improve the generalization ability of visuomotor policies as well as preserve training stability from two aspects: 1) We learn a control-aware mask through a self-supervised reconstruction task with three auxiliary losses and then apply strong augmentation only to those control-irrelevant regions based on the mask to reduce the generalization gaps. 2) To address training instability issues prevalent in visual reinforcement learning (RL), we distill the knowledge from a pretrained RL expert processing low-level environment states, to the student visuomotor policy. The policy is subsequently deployed to unseen environments without any further finetuning. We conducted comparison and ablation studies across various benchmarks: the DMControl Generalization Benchmark (DMC-GB), the enhanced Robot Manipulation Distraction Benchmark (RMDB), and a specialized long-horizontal drawer-opening robotic task. The extensive experimental results well demonstrate the effectiveness of our method, e.g., showing a 17\% improvement over previous methods in the video-hard setting of DMC-GB.

Related papers

From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
RoboKoop: Efficient Control Conditioned Representations from Visual Input in Robotics using Koopman Operator [14.77553682217217]
We introduce a Contrastive Spectral Koopman Embedding network that allows us to learn efficient linearized visual representations from the agent's visual data in a high dimensional latent space. Our method enhances stability and control in gradient dynamics over time, significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-09-04T22:14:59Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
DEAR: Disentangled Environment and Agent Representations for Reinforcement Learning without Reconstruction [4.813546138483559]
Reinforcement Learning (RL) algorithms can learn robotic control tasks from visual observations, but they often require a large amount of data. In this paper, we explore how the agent's knowledge of its shape can improve the sample efficiency of visual RL methods. We propose a novel method, Disentangled Environment and Agent Representations, that uses the segmentation mask of the agent as supervision.
arXiv Detail & Related papers (2024-06-30T09:15:21Z)
Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data. Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment. Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z)
Hierarchical Transformers are Efficient Meta-Reinforcement Learners [19.79721574250755]
We introduce Hierarchical Transformers for Meta-Reinforcement Learning (HTrMRL), a powerful online meta-reinforcement learning approach. We demonstrate how past episodes serve as a rich source of information, which our model effectively distills and applies to new contexts.
arXiv Detail & Related papers (2024-02-09T13:40:11Z)
Consistency Regularization for Generalizable Source-free Domain Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset. Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets. We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z)
Normalization Enhances Generalization in Visual Reinforcement Learning [20.04754884180226]
normalization techniques have demonstrated huge success in supervised and unsupervised learning. We find that incorporating suitable normalization techniques is sufficient to enhance the generalization capabilities. Our method significantly improves generalization capability while only marginally affecting sample efficiency.
arXiv Detail & Related papers (2023-06-01T13:24:56Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning [27.205521177841568]
We propose Task-aware Lipschitz Data Augmentation (TLDA) for visual Reinforcement Learning (RL) TLDA explicitly identifies the task-correlated pixels with large Lipschitz constants, and only augments the task-irrelevant pixels. It outperforms previous state-of-the-art methods across the 3 different visual control benchmarks.
arXiv Detail & Related papers (2022-02-21T04:22:07Z)
Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation [25.493902939111265]
We investigate causes of instability when using data augmentation in off-policy Reinforcement Learning algorithms. We propose a simple yet effective technique for stabilizing this class of algorithms under augmentation. Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL.
arXiv Detail & Related papers (2021-07-01T17:58:05Z)
Generalization of Reinforcement Learning with Policy-Aware Adversarial Data Augmentation [32.70482982044965]
We propose a novel policy-aware adversarial data augmentation method to augment the standard policy learning method with automatically generated trajectory data. We conduct experiments on a number of RL tasks to investigate the generalization performance of the proposed method. The results show our method can generalize well with limited training diversity, and achieve the state-of-the-art generalization test performance.
arXiv Detail & Related papers (2021-06-29T17:21:59Z)
SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies [87.78260740602674]
Generalization has been a long-standing challenge for reinforcement learning (RL) In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization.
arXiv Detail & Related papers (2021-06-17T17:28:18Z)
Residual Reinforcement Learning from Demonstrations [51.56457466788513]
Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal. We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations. Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning.
arXiv Detail & Related papers (2021-06-15T11:16:49Z)
Robust Deep Reinforcement Learning via Multi-View Information Bottleneck [7.188571996124112]
We introduce an auxiliary objective based on the multi-view information bottleneck (MIB) principle. This encourages learning representations that are both predictive of the future and less sensitive to task-irrelevant distractions. We demonstrate that our approach can achieve SOTA performance on challenging visual control tasks, even when the background is replaced with natural videos.
arXiv Detail & Related papers (2021-02-26T02:24:36Z)
Generalization in Reinforcement Learning by Soft Data Augmentation [11.752595047069505]
SOft Data Augmentation (SODA) is a method that decouples augmentation from policy learning. We find SODA to significantly advance sample efficiency, generalization, and stability in training over state-of-the-art vision-based RL methods.
arXiv Detail & Related papers (2020-11-26T17:00:34Z)
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.