Continuous Transition: Improving Sample Efficiency for Continuous
Control Problems via MixUp
- URL: http://arxiv.org/abs/2011.14487v2
- Date: Sun, 7 Mar 2021 04:59:11 GMT
- Title: Continuous Transition: Improving Sample Efficiency for Continuous
Control Problems via MixUp
- Authors: Junfan Lin, Zhongzhan Huang, Keze Wang, Xiaodan Liang, Weiwei Chen,
and Liang Lin
- Abstract summary: This paper introduces a concise yet powerful method to construct Continuous Transition.
Specifically, we propose to synthesize new transitions for training by linearly interpolating the consecutive transitions.
To keep the constructed transitions authentic, we also develop a discriminator to guide the construction process automatically.
- Score: 119.69304125647785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although deep reinforcement learning (RL) has been successfully applied to a
variety of robotic control tasks, it's still challenging to apply it to
real-world tasks, due to the poor sample efficiency. Attempting to overcome
this shortcoming, several works focus on reusing the collected trajectory data
during the training by decomposing them into a set of policy-irrelevant
discrete transitions. However, their improvements are somewhat marginal since
i) the amount of the transitions is usually small, and ii) the value assignment
only happens in the joint states. To address these issues, this paper
introduces a concise yet powerful method to construct Continuous Transition,
which exploits the trajectory information by exploiting the potential
transitions along the trajectory. Specifically, we propose to synthesize new
transitions for training by linearly interpolating the consecutive transitions.
To keep the constructed transitions authentic, we also develop a discriminator
to guide the construction process automatically. Extensive experiments
demonstrate that our proposed method achieves a significant improvement in
sample efficiency on various complex continuous robotic control problems in
MuJoCo and outperforms the advanced model-based / model-free RL methods. The
source code is available.
Related papers
- Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Beyond Anti-Forgetting: Multimodal Continual Instruction Tuning with Positive Forward Transfer [21.57847333976567]
Multimodal Continual Instruction Tuning (MCIT) enables Multimodal Large Language Models (MLLMs) to meet continuously emerging requirements without expensive retraining.
MCIT faces two major obstacles: catastrophic forgetting (where old knowledge is forgotten) and negative forward transfer.
We propose Prompt Tuning with Positive Forward Transfer (Fwd-Prompt) to address these issues.
arXiv Detail & Related papers (2024-01-17T12:44:17Z) - Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning.
Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing.
We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z) - Diffusion Generative Flow Samplers: Improving learning signals through
partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments.
Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z) - Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z) - Real-time Controllable Motion Transition for Characters [14.88407656218885]
Real-time in-between motion generation is universally required in games and highly desirable in existing animation pipelines.
Our approach consists of two key components: motion manifold and conditional transitioning.
We show that our method is able to generate high-quality motions measured under multiple metrics.
arXiv Detail & Related papers (2022-05-05T10:02:54Z) - Transition Motion Tensor: A Data-Driven Approach for Versatile and
Controllable Agents in Physically Simulated Environments [6.8438089867929905]
This paper proposes a data-driven framework that creates novel and physically accurate transitions outside of the motion dataset.
It enables simulated characters to adopt new motion skills efficiently and robustly without modifying existing ones.
arXiv Detail & Related papers (2021-11-30T02:17:25Z) - Adversarial Imitation Learning with Trajectorial Augmentation and
Correction [61.924411952657756]
We introduce a novel augmentation method which preserves the success of the augmented trajectories.
We develop an adversarial data augmented imitation architecture to train an imitation agent using synthetic experts.
Experiments show that our data augmentation strategy can improve accuracy and convergence time of adversarial imitation.
arXiv Detail & Related papers (2021-03-25T14:49:32Z) - Data-efficient Weakly-supervised Learning for On-line Object Detection
under Domain Shift in Robotics [24.878465999976594]
Several object detection methods have been proposed in the literature, the vast majority based on Deep Convolutional Neural Networks (DCNNs)
These methods have important limitations for robotics: Learning solely on off-line data may introduce biases, and prevents adaptation to novel tasks.
In this work, we investigate how weakly-supervised learning can cope with these problems.
arXiv Detail & Related papers (2020-12-28T16:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.