TAMPC: A Controller for Escaping Traps in Novel Environments
- URL: http://arxiv.org/abs/2010.12516v3
- Date: Wed, 3 Feb 2021 18:22:05 GMT
- Title: TAMPC: A Controller for Escaping Traps in Novel Environments
- Authors: Sheng Zhong (1), Zhenyuan Zhang (1), Nima Fazeli (1), Dmitry Berenson
(1) ((1) Robotics Institute, University of Michigan)
- Abstract summary: We learn dynamics for a system without traps from a randomly collected training set.
When unexpected traps arise in execution, we must find a way to adapt our dynamics and control strategy.
Our approach, Trap-Aware Model Predictive Control (TAMPC), is a two-level hierarchical control algorithm.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an approach to online model adaptation and control in the
challenging case of hybrid and discontinuous dynamics where actions may lead to
difficult-to-escape "trap" states, under a given controller. We first learn
dynamics for a system without traps from a randomly collected training set
(since we do not know what traps will be encountered online). These "nominal"
dynamics allow us to perform tasks in scenarios where the dynamics matches the
training data, but when unexpected traps arise in execution, we must find a way
to adapt our dynamics and control strategy and continue attempting the task.
Our approach, Trap-Aware Model Predictive Control (TAMPC), is a two-level
hierarchical control algorithm that reasons about traps and non-nominal
dynamics to decide between goal-seeking and recovery policies. An important
requirement of our method is the ability to recognize nominal dynamics even
when we encounter data that is out-of-distribution w.r.t the training data. We
achieve this by learning a representation for dynamics that exploits invariance
in the nominal environment, thus allowing better generalization. We evaluate
our method on simulated planar pushing and peg-in-hole as well as real robot
peg-in-hole problems against adaptive control, reinforcement learning,
trap-handling baselines, where traps arise due to unexpected obstacles that we
only observe through contact. Our results show that our method outperforms the
baselines on difficult tasks, and is comparable to prior trap-handling methods
on easier tasks.
Related papers
- Guided Reinforcement Learning for Robust Multi-Contact Loco-Manipulation [12.377289165111028]
Reinforcement learning (RL) often necessitates a meticulous Markov Decision Process (MDP) design tailored to each task.
This work proposes a systematic approach to behavior synthesis and control for multi-contact loco-manipulation tasks.
We define a task-independent MDP to train RL policies using only a single demonstration per task generated from a model-based trajectory.
arXiv Detail & Related papers (2024-10-17T17:46:27Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Chain-of-Thought Predictive Control [32.30974063877643]
We study generalizable policy learning from demonstrations for complex low-level control.
We propose a novel hierarchical imitation learning method that utilizes sub-optimal demos.
arXiv Detail & Related papers (2023-04-03T07:59:13Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics.
Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence.
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z) - Multi-Task Reinforcement Learning based Mobile Manipulation Control for
Dynamic Object Tracking and Grasping [17.2022039806473]
A multi-task reinforcement learning-based mobile manipulation control framework is proposed to achieve general dynamic object tracking and grasping.
Experiments show that our policy trained can adapt to unseen random dynamic trajectories with about 0.1m tracking error and 75% grasping success rate.
arXiv Detail & Related papers (2020-06-07T21:18:36Z) - Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget.
We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control.
We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.