Chain of Thought Imitation with Procedure Cloning
- URL: http://arxiv.org/abs/2205.10816v1
- Date: Sun, 22 May 2022 13:14:09 GMT
- Title: Chain of Thought Imitation with Procedure Cloning
- Authors: Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum
- Abstract summary: We propose procedure cloning, which applies supervised sequence prediction to imitate the series of expert computations.
We show that imitating the intermediate computations of an expert's behavior enables procedure cloning to learn policies exhibiting significant generalization to unseen environment configurations.
- Score: 129.62135987416164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning aims to extract high-performance policies from logged
demonstrations of expert behavior. It is common to frame imitation learning as
a supervised learning problem in which one fits a function approximator to the
input-output mapping exhibited by the logged demonstrations (input observations
to output actions). While the framing of imitation learning as a supervised
input-output learning problem allows for applicability in a wide variety of
settings, it is also an overly simplistic view of the problem in situations
where the expert demonstrations provide much richer insight into expert
behavior. For example, applications such as path navigation, robot
manipulation, and strategy games acquire expert demonstrations via planning,
search, or some other multi-step algorithm, revealing not just the output
action to be imitated but also the procedure for how to determine this action.
While these intermediate computations may use tools not available to the agent
during inference (e.g., environment simulators), they are nevertheless
informative as a way to explain an expert's mapping of state to actions. To
properly leverage expert procedure information without relying on the
privileged tools the expert may have used to perform the procedure, we propose
procedure cloning, which applies supervised sequence prediction to imitate the
series of expert computations. This way, procedure cloning learns not only what
to do (i.e., the output action), but how and why to do it (i.e., the
procedure). Through empirical analysis on navigation, simulated robotic
manipulation, and game-playing environments, we show that imitating the
intermediate computations of an expert's behavior enables procedure cloning to
learn policies exhibiting significant generalization to unseen environment
configurations, including those configurations for which running the expert's
procedure directly is infeasible.
Related papers
- Unveiling the Decision-Making Process in Reinforcement Learning with Genetic Programming [4.249842620609683]
Incomprehensibility is not an option for the use of (deep) reinforcement learning in the real world.
We propose a genetic programming framework to generate explanations for the decision-making process of already trained agents.
We show that we are comparable in performance but require much less hardware resources and computation time.
arXiv Detail & Related papers (2024-07-20T00:45:03Z) - A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult.
We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z) - NaturalVLM: Leveraging Fine-grained Natural Language for
Affordance-Guided Visual Manipulation [21.02437461550044]
Many real-world tasks demand intricate multi-step reasoning.
We introduce a benchmark, NrVLM, comprising 15 distinct manipulation tasks.
We propose a novel learning framework that completes the manipulation task step-by-step according to the fine-grained instructions.
arXiv Detail & Related papers (2024-03-13T09:12:16Z) - Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos [16.333295670635557]
We explore the capability of an agent to construct a logical sequence of action steps, thereby assembling a strategic procedural plan.
This plan is crucial for navigating from an initial visual observation to a target visual outcome, as depicted in real-life instructional videos.
We coin our approach KEPP, a novel Knowledge-Enhanced Procedure Planning system, which harnesses a probabilistic procedural knowledge graph extracted from training data.
arXiv Detail & Related papers (2024-03-05T08:55:51Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Procedure-Aware Pretraining for Instructional Video Understanding [58.214549181779006]
Key challenge in procedure understanding is to be able to extract from unlabeled videos the procedural knowledge.
Our main insight is that instructional videos depict sequences of steps that repeat between instances of the same or different tasks.
This graph can then be used to generate pseudo labels to train a video representation that encodes the procedural knowledge in a more accessible form.
arXiv Detail & Related papers (2023-03-31T17:41:31Z) - Self-supervised Transformer for Deepfake Detection [112.81127845409002]
Deepfake techniques in real-world scenarios require stronger generalization abilities of face forgery detectors.
Inspired by transfer learning, neural networks pre-trained on other large-scale face-related tasks may provide useful features for deepfake detection.
In this paper, we propose a self-supervised transformer based audio-visual contrastive learning method.
arXiv Detail & Related papers (2022-03-02T17:44:40Z) - Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved.
We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z) - Hierarchical Variational Imitation Learning of Control Programs [131.7671843857375]
We propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP)
Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations.
We demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods.
arXiv Detail & Related papers (2019-12-29T08:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.