For Pre-Trained Vision Models in Motor Control, Not All Policy Learning
Methods are Created Equal
- URL: http://arxiv.org/abs/2304.04591v2
- Date: Tue, 20 Jun 2023 08:23:01 GMT
- Title: For Pre-Trained Vision Models in Motor Control, Not All Policy Learning
Methods are Created Equal
- Authors: Yingdong Hu, Renhao Wang, Li Erran Li, Yang Gao
- Abstract summary: It remains unclear if pre-trained vision models are consistent in their effectiveness under different control policies.
Our study yields a series of intriguing results, including the discovery that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm.
- Score: 17.467998596393116
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, increasing attention has been directed to leveraging
pre-trained vision models for motor control. While existing works mainly
emphasize the importance of this pre-training phase, the arguably equally
important role played by downstream policy learning during control-specific
fine-tuning is often neglected. It thus remains unclear if pre-trained vision
models are consistent in their effectiveness under different control policies.
To bridge this gap in understanding, we conduct a comprehensive study on 14
pre-trained vision models using 3 distinct classes of policy learning methods,
including reinforcement learning (RL), imitation learning through behavior
cloning (BC), and imitation learning with a visual reward function (VRF). Our
study yields a series of intriguing results, including the discovery that the
effectiveness of pre-training is highly dependent on the choice of the
downstream policy learning algorithm. We show that conventionally accepted
evaluation based on RL methods is highly variable and therefore unreliable, and
further advocate for using more robust methods like VRF and BC. To facilitate
more universal evaluations of pre-trained models and their policy learning
methods in the future, we also release a benchmark of 21 tasks across 3
different environments alongside our work.
Related papers
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - Reinforcement Learning with Action-Free Pre-Training from Videos [95.25074614579646]
We introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos.
Our framework significantly improves both final performances and sample-efficiency of vision-based reinforcement learning.
arXiv Detail & Related papers (2022-03-25T19:44:09Z) - Data-efficient visuomotor policy training using reinforcement learning
and generative models [27.994338318811952]
We present a data-efficient framework for solving visuomotor sequential decision-making problems.
We exploit the combination of reinforcement learning and latent variable generative models.
arXiv Detail & Related papers (2020-07-26T14:19:00Z) - Ready Policy One: World Building Through Active Learning [35.358315617358976]
We introduce Ready Policy One (RP1), a framework that views Model-Based Reinforcement Learning as an active learning problem.
RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization.
We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
arXiv Detail & Related papers (2020-02-07T09:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.