Chain-of-Thought Predictive Control
- URL: http://arxiv.org/abs/2304.00776v2
- Date: Sun, 7 Jul 2024 22:06:34 GMT
- Title: Chain-of-Thought Predictive Control
- Authors: Zhiwei Jia, Vineet Thumuluri, Fangchen Liu, Linghao Chen, Zhiao Huang, Hao Su,
- Abstract summary: We study generalizable policy learning from demonstrations for complex low-level control.
We propose a novel hierarchical imitation learning method that utilizes sub-optimal demos.
- Score: 32.30974063877643
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study generalizable policy learning from demonstrations for complex low-level control (e.g., contact-rich object manipulations). We propose a novel hierarchical imitation learning method that utilizes sub-optimal demos. Firstly, we propose an observation space-agnostic approach that efficiently discovers the multi-step subskill decomposition of the demos in an unsupervised manner. By grouping temporarily close and functionally similar actions into subskill-level demo segments, the observations at the segment boundaries constitute a chain of planning steps for the task, which we refer to as the chain-of-thought (CoT). Next, we propose a Transformer-based design that effectively learns to predict the CoT as the subskill-level guidance. We couple action and subskill predictions via learnable prompt tokens and a hybrid masking strategy, which enable dynamically updated guidance at test time and improve feature representation of the trajectory for generalizable policy learning. Our method, Chain-of-Thought Predictive Control (CoTPC), consistently surpasses existing strong baselines on challenging manipulation tasks with sub-optimal demos.
Related papers
- Guided Reinforcement Learning for Robust Multi-Contact Loco-Manipulation [12.377289165111028]
Reinforcement learning (RL) often necessitates a meticulous Markov Decision Process (MDP) design tailored to each task.
This work proposes a systematic approach to behavior synthesis and control for multi-contact loco-manipulation tasks.
We define a task-independent MDP to train RL policies using only a single demonstration per task generated from a model-based trajectory.
arXiv Detail & Related papers (2024-10-17T17:46:27Z) - Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - KOPPA: Improving Prompt-based Continual Learning with Key-Query
Orthogonal Projection and Prototype-based One-Versus-All [26.506535205897443]
We introduce a novel key-query learning strategy to enhance prompt matching efficiency and address the challenge of shifting features.
Our method empowers the model to achieve results surpassing those of current state-of-the-art approaches by a large margin of up to 20%.
arXiv Detail & Related papers (2023-11-26T20:35:19Z) - Unsupervised Meta-Learning via Few-shot Pseudo-supervised Contrastive
Learning [72.3506897990639]
We propose a simple yet effective unsupervised meta-learning framework, coined Pseudo-supervised Contrast (PsCo) for few-shot classification.
PsCo outperforms existing unsupervised meta-learning methods under various in-domain and cross-domain few-shot classification benchmarks.
arXiv Detail & Related papers (2023-03-02T06:10:13Z) - Option-Aware Adversarial Inverse Reinforcement Learning for Robotic
Control [44.77500987121531]
Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations.
We develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning.
We also propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion.
arXiv Detail & Related papers (2022-10-05T00:28:26Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Instance-Aware Predictive Navigation in Multi-Agent Environments [93.15055834395304]
We propose an Instance-Aware Predictive Control (IPC) approach, which forecasts interactions between agents as well as future scene structures.
We adopt a novel multi-instance event prediction module to estimate the possible interaction among agents in the ego-centric view.
We design a sequential action sampling strategy to better leverage predicted states on both scene-level and instance-level.
arXiv Detail & Related papers (2021-01-14T22:21:25Z) - TAMPC: A Controller for Escaping Traps in Novel Environments [0.0]
We learn dynamics for a system without traps from a randomly collected training set.
When unexpected traps arise in execution, we must find a way to adapt our dynamics and control strategy.
Our approach, Trap-Aware Model Predictive Control (TAMPC), is a two-level hierarchical control algorithm.
arXiv Detail & Related papers (2020-10-23T16:33:15Z) - Effective Unsupervised Domain Adaptation with Adversarially Trained
Language Models [54.569004548170824]
We show that careful masking strategies can bridge the knowledge gap of masked language models.
We propose an effective training strategy by adversarially masking out those tokens which are harder to adversarial by the underlying.
arXiv Detail & Related papers (2020-10-05T01:49:47Z) - Hierarchical Variational Imitation Learning of Control Programs [131.7671843857375]
We propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP)
Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations.
We demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods.
arXiv Detail & Related papers (2019-12-29T08:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.