RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory
Sketches
- URL: http://arxiv.org/abs/2311.01977v2
- Date: Mon, 6 Nov 2023 05:53:08 GMT
- Title: RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory
Sketches
- Authors: Jiayuan Gu, Sean Kirmani, Paul Wohlhart, Yao Lu, Montserrat Gonzalez
Arenas, Kanishka Rao, Wenhao Yu, Chuyuan Fu, Keerthana Gopalakrishnan, Zhuo
Xu, Priya Sundaresan, Peng Xu, Hao Su, Karol Hausman, Chelsea Finn, Quan
Vuong, Ted Xiao
- Abstract summary: Generalization remains one of the most important desiderata for robust robot learning systems.
We propose a policy conditioning method using rough trajectory sketches.
We show that RT-Trajectory is able to perform a wider range of tasks compared to language-conditioned and goal-conditioned policies.
- Score: 74.300116260004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization remains one of the most important desiderata for robust robot
learning systems. While recently proposed approaches show promise in
generalization to novel objects, semantic concepts, or visual distribution
shifts, generalization to new tasks remains challenging. For example, a
language-conditioned policy trained on pick-and-place tasks will not be able to
generalize to a folding task, even if the arm trajectory of folding is similar
to pick-and-place. Our key insight is that this kind of generalization becomes
feasible if we represent the task through rough trajectory sketches. We propose
a policy conditioning method using such rough trajectory sketches, which we
call RT-Trajectory, that is practical, easy to specify, and allows the policy
to effectively perform new tasks that would otherwise be challenging to
perform. We find that trajectory sketches strike a balance between being
detailed enough to express low-level motion-centric guidance while being coarse
enough to allow the learned policy to interpret the trajectory sketch in the
context of situational visual observations. In addition, we show how trajectory
sketches can provide a useful interface to communicate with robotic policies:
they can be specified through simple human inputs like drawings or videos, or
through automated methods such as modern image-generating or
waypoint-generating methods. We evaluate RT-Trajectory at scale on a variety of
real-world robotic tasks, and find that RT-Trajectory is able to perform a
wider range of tasks compared to language-conditioned and goal-conditioned
policies, when provided the same training data.
Related papers
- RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation [52.14638923430338]
We propose conditioning policies on affordances, which capture the pose of the robot at key stages of the task.
Our method, RT-Affordance, is a hierarchical model that first proposes an affordance plan given the task language.
We show on a diverse set of novel tasks how RT-Affordance exceeds the performance of existing methods by over 50%.
arXiv Detail & Related papers (2024-11-05T01:02:51Z) - Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation [49.43094200366251]
We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition.
Our method, Policy Adaptation via Language Optimization (PALO), combines a handful of demonstrations of a task with proposed language decompositions.
We find that PALO is able of consistently complete long-horizon, multi-tier tasks in the real world, outperforming state of the art pre-trained generalist policies.
arXiv Detail & Related papers (2024-08-29T03:03:35Z) - RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation [36.43143326197769]
Track-Any-Point (TAP) models isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene configuration.
We show this results in robust robot policies that can solve complex object-arrangement tasks such as shape-matching, stacking, and even full path-following tasks such as applying glue and sticking objects together.
arXiv Detail & Related papers (2023-08-30T11:57:04Z) - Planning Immediate Landmarks of Targets for Model-Free Skill Transfer
across Agents [34.56191646231944]
We propose PILoT, i.e., Planning Immediate Landmarks of Targets.
PILoT learns a goal-conditioned state planner and distills a goal-planner to plan immediate landmarks in a model-free style.
We show the power of PILoT on various transferring challenges, including few-shot transferring across action spaces and dynamics.
arXiv Detail & Related papers (2022-12-18T08:03:21Z) - Abstract-to-Executable Trajectory Translation for One-Shot Task
Generalization [21.709054087028946]
We propose to achieve one-shot task generalization by decoupling plan generation and plan execution.
Our method solves complex long-horizon tasks in three steps: build a paired abstract environment, generate abstract trajectories, and solve the original task by an abstract-to-executable trajectory translator.
arXiv Detail & Related papers (2022-10-14T09:17:34Z) - Generalization with Lossy Affordances: Leveraging Broad Offline Data for
Learning Visuomotor Tasks [65.23947618404046]
We introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data.
When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems.
We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
arXiv Detail & Related papers (2022-10-12T21:46:38Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - Transferable Task Execution from Pixels through Deep Planning Domain
Learning [46.88867228115775]
We propose Deep Planning Domain Learning (DPDL) to learn a hierarchical model.
DPDL learns a high-level model which predicts values for a set of logical predicates consisting of the current symbolic world state.
This allows us to perform complex, multi-step tasks even when the robot has not been explicitly trained on them.
arXiv Detail & Related papers (2020-03-08T05:51:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.