Few-Shot In-Context Imitation Learning via Implicit Graph Alignment
- URL: http://arxiv.org/abs/2310.12238v1
- Date: Wed, 18 Oct 2023 18:26:01 GMT
- Title: Few-Shot In-Context Imitation Learning via Implicit Graph Alignment
- Authors: Vitalis Vosylius and Edward Johns
- Abstract summary: We formulate imitation learning as a conditional alignment problem between graph representations of objects.
We show that this conditioning allows for in-context learning, where a robot can perform a task on a set of new objects immediately after the demonstrations.
- Score: 15.215659641228655
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Consider the following problem: given a few demonstrations of a task across a
few different objects, how can a robot learn to perform that same task on new,
previously unseen objects? This is challenging because the large variety of
objects within a class makes it difficult to infer the task-relevant
relationship between the new objects and the objects in the demonstrations. We
address this by formulating imitation learning as a conditional alignment
problem between graph representations of objects. Consequently, we show that
this conditioning allows for in-context learning, where a robot can perform a
task on a set of new objects immediately after the demonstrations, without any
prior knowledge about the object class or any further training. In our
experiments, we explore and validate our design choices, and we show that our
method is highly effective for few-shot learning of several real-world,
everyday tasks, whilst outperforming baselines. Videos are available on our
project webpage at https://www.robot-learning.uk/implicit-graph-alignment.
Related papers
- Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - Generalization to New Sequential Decision Making Tasks with In-Context
Learning [23.36106067650874]
Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning.
In this paper, we show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks.
We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environmentity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks.
arXiv Detail & Related papers (2023-12-06T15:19:28Z) - RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation [36.43143326197769]
Track-Any-Point (TAP) models isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene configuration.
We show this results in robust robot policies that can solve complex object-arrangement tasks such as shape-matching, stacking, and even full path-following tasks such as applying glue and sticking objects together.
arXiv Detail & Related papers (2023-08-30T11:57:04Z) - Learning Sensorimotor Primitives of Sequential Manipulation Tasks from
Visual Demonstrations [13.864448233719598]
This paper describes a new neural network-based framework for learning simultaneously low-level policies and high-level policies.
A key feature of the proposed approach is that the policies are learned directly from raw videos of task demonstrations.
Empirical results on object manipulation tasks with a robotic arm show that the proposed network can efficiently learn from real visual demonstrations to perform the tasks.
arXiv Detail & Related papers (2022-03-08T01:36:48Z) - A Differentiable Recipe for Learning Visual Non-Prehensile Planar
Manipulation [63.1610540170754]
We focus on the problem of visual non-prehensile planar manipulation.
We propose a novel architecture that combines video decoding neural models with priors from contact mechanics.
We find that our modular and fully differentiable architecture performs better than learning-only methods on unseen objects and motions.
arXiv Detail & Related papers (2021-11-09T18:39:45Z) - Simultaneous Multi-View Object Recognition and Grasping in Open-Ended
Domains [0.0]
We propose a deep learning architecture with augmented memory capacities to handle open-ended object recognition and grasping simultaneously.
We demonstrate the ability of our approach to grasp never-seen-before objects and to rapidly learn new object categories using very few examples on-site in both simulation and real-world settings.
arXiv Detail & Related papers (2021-06-03T14:12:11Z) - Actionable Models: Unsupervised Offline Reinforcement Learning of
Robotic Skills [93.12417203541948]
We propose the objective of learning a functional understanding of the environment by learning to reach any goal state in a given dataset.
We find that our method can operate on high-dimensional camera images and learn a variety of skills on real robots that generalize to previously unseen scenes and objects.
arXiv Detail & Related papers (2021-04-15T20:10:11Z) - COG: Connecting New Skills to Past Experience with Offline Reinforcement
Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming.
We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task.
We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - Modeling Long-horizon Tasks as Sequential Interaction Landscapes [75.5824586200507]
We present a deep learning network that learns dependencies and transitions across subtasks solely from a set of demonstration videos.
We show that these symbols can be learned and predicted directly from image observations.
We evaluate our framework on two long horizon tasks: (1) block stacking of puzzle pieces being executed by humans, and (2) a robot manipulation task involving pick and place of objects and sliding a cabinet door with a 7-DoF robot arm.
arXiv Detail & Related papers (2020-06-08T18:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.