Related papers: Chain of Thought Imitation with Procedure Cloning

Chain of Thought Imitation with Procedure Cloning

URL: http://arxiv.org/abs/2205.10816v1
Date: Sun, 22 May 2022 13:14:09 GMT
Title: Chain of Thought Imitation with Procedure Cloning
Authors: Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum
Abstract summary: We propose procedure cloning, which applies supervised sequence prediction to imitate the series of expert computations. We show that imitating the intermediate computations of an expert's behavior enables procedure cloning to learn policies exhibiting significant generalization to unseen environment configurations.
Score: 129.62135987416164
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior. It is common to frame imitation learning as a supervised learning problem in which one fits a function approximator to the input-output mapping exhibited by the logged demonstrations (input observations to output actions). While the framing of imitation learning as a supervised input-output learning problem allows for applicability in a wide variety of settings, it is also an overly simplistic view of the problem in situations where the expert demonstrations provide much richer insight into expert behavior. For example, applications such as path navigation, robot manipulation, and strategy games acquire expert demonstrations via planning, search, or some other multi-step algorithm, revealing not just the output action to be imitated but also the procedure for how to determine this action. While these intermediate computations may use tools not available to the agent during inference (e.g., environment simulators), they are nevertheless informative as a way to explain an expert's mapping of state to actions. To properly leverage expert procedure information without relying on the privileged tools the expert may have used to perform the procedure, we propose procedure cloning, which applies supervised sequence prediction to imitate the series of expert computations. This way, procedure cloning learns not only what to do (i.e., the output action), but how and why to do it (i.e., the procedure). Through empirical analysis on navigation, simulated robotic manipulation, and game-playing environments, we show that imitating the intermediate computations of an expert's behavior enables procedure cloning to learn policies exhibiting significant generalization to unseen environment configurations, including those configurations for which running the expert's procedure directly is infeasible.

Related papers

Behavioral Exploration: Learning to Explore via In-Context Adaptation [53.92981562916783]
We train a long-context generative model to predict expert actions conditioned on a context of past observations and a measure of how exploratory'' the expert's behaviors are relative to this context.<n>This enables the model to not only mimic the behavior of an expert, but also, by feeding its past history of interactions into its context, to select different expert behaviors than what have been previously selected.<n>We demonstrate the effectiveness of our method in both simulated locomotion and manipulation settings, as well as on real-world robotic manipulation tasks.
arXiv Detail & Related papers (2025-07-11T21:36:19Z)
Inductive Learning of Robot Task Knowledge from Raw Data and Online Expert Feedback [3.10979520014442]
An increasing level of autonomy of robots poses challenges of trust and social acceptance, especially in human-robot interaction scenarios. This requires an interpretable implementation of robotic cognitive capabilities, possibly based on formal methods as logics for the definition of task specifications. We propose an offline algorithm based on inductive logic programming from noisy examples to extract task specifications.
arXiv Detail & Related papers (2025-01-13T17:25:46Z)
Unveiling the Decision-Making Process in Reinforcement Learning with Genetic Programming [4.249842620609683]
Incomprehensibility is not an option for the use of (deep) reinforcement learning in the real world. We propose a genetic programming framework to generate explanations for the decision-making process of already trained agents. We show that we are comparable in performance but require much less hardware resources and computation time.
arXiv Detail & Related papers (2024-07-20T00:45:03Z)
A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult. We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z)
NaturalVLM: Leveraging Fine-grained Natural Language for Affordance-Guided Visual Manipulation [21.02437461550044]
Many real-world tasks demand intricate multi-step reasoning. We introduce a benchmark, NrVLM, comprising 15 distinct manipulation tasks. We propose a novel learning framework that completes the manipulation task step-by-step according to the fine-grained instructions.
arXiv Detail & Related papers (2024-03-13T09:12:16Z)
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos [16.333295670635557]
We explore the capability of an agent to construct a logical sequence of action steps, thereby assembling a strategic procedural plan. This plan is crucial for navigating from an initial visual observation to a target visual outcome, as depicted in real-life instructional videos. We coin our approach KEPP, a novel Knowledge-Enhanced Procedure Planning system, which harnesses a probabilistic procedural knowledge graph extracted from training data.
arXiv Detail & Related papers (2024-03-05T08:55:51Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
Procedure-Aware Pretraining for Instructional Video Understanding [58.214549181779006]
Key challenge in procedure understanding is to be able to extract from unlabeled videos the procedural knowledge. Our main insight is that instructional videos depict sequences of steps that repeat between instances of the same or different tasks. This graph can then be used to generate pseudo labels to train a video representation that encodes the procedural knowledge in a more accessible form.
arXiv Detail & Related papers (2023-03-31T17:41:31Z)
Self-supervised Transformer for Deepfake Detection [112.81127845409002]
Deepfake techniques in real-world scenarios require stronger generalization abilities of face forgery detectors. Inspired by transfer learning, neural networks pre-trained on other large-scale face-related tasks may provide useful features for deepfake detection. In this paper, we propose a self-supervised transformer based audio-visual contrastive learning method.
arXiv Detail & Related papers (2022-03-02T17:44:40Z)
Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process. Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved. We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)
Hierarchical Variational Imitation Learning of Control Programs [131.7671843857375]
We propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP) Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations. We demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods.
arXiv Detail & Related papers (2019-12-29T08:57:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.