Related papers: RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation

RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation

URL: http://arxiv.org/abs/2411.02704v1
Date: Tue, 05 Nov 2024 01:02:51 GMT
Title: RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation
Authors: Soroush Nasiriany, Sean Kirmani, Tianli Ding, Laura Smith, Yuke Zhu, Danny Driess, Dorsa Sadigh, Ted Xiao,
Abstract summary: We propose conditioning policies on affordances, which capture the pose of the robot at key stages of the task. Our method, RT-Affordance, is a hierarchical model that first proposes an affordance plan given the task language. We show on a diverse set of novel tasks how RT-Affordance exceeds the performance of existing methods by over 50%.
Score: 52.14638923430338
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We explore how intermediate policy representations can facilitate generalization by providing guidance on how to perform manipulation tasks. Existing representations such as language, goal images, and trajectory sketches have been shown to be helpful, but these representations either do not provide enough context or provide over-specified context that yields less robust policies. We propose conditioning policies on affordances, which capture the pose of the robot at key stages of the task. Affordances offer expressive yet lightweight abstractions, are easy for users to specify, and facilitate efficient learning by transferring knowledge from large internet datasets. Our method, RT-Affordance, is a hierarchical model that first proposes an affordance plan given the task language, and then conditions the policy on this affordance plan to perform manipulation. Our model can flexibly bridge heterogeneous sources of supervision including large web datasets and robot trajectories. We additionally train our model on cheap-to-collect in-domain affordance images, allowing us to learn new tasks without collecting any additional costly robot trajectories. We show on a diverse set of novel tasks how RT-Affordance exceeds the performance of existing methods by over 50%, and we empirically demonstrate that affordances are robust to novel settings. Videos available at https://snasiriany.me/rt-affordance

Related papers

Policy Learning from Large Vision-Language Model Feedback without Reward Modeling [19.48826538310603]
We introduce PLARE, a novel approach that leverages large vision-language models (VLMs) to provide guidance signals for agent training.<n>Instead of relying on manually designed reward functions, PLARE queries a VLM for preference labels on pairs of visual trajectory segments.<n>The policy is then trained directly from these preference labels using a supervised contrastive preference learning objective.
arXiv Detail & Related papers (2025-07-31T10:07:49Z)
Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies. Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors. We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
Affordance-Centric Policy Learning: Sample Efficient and Generalisable Robot Policy Learning using Affordance-Centric Task Frames [15.800100875117312]
Affordances are central to robotic manipulation, where most tasks can be simplified to interactions with task-specific regions on objects. We propose an affordance-centric policy-learning approach that centres and appropriately textitorients a textittask frame on these affordance regions. We demonstrate that our approach can learn manipulation tasks using behaviour cloning from as little as 10 demonstrations, with equivalent generalisation to an image-based policy trained on 305 demonstrations.
arXiv Detail & Related papers (2024-10-15T23:57:35Z)
Affordance-based Robot Manipulation with Flow Matching [6.863932324631107]
Our framework unifies affordance model learning and trajectory generation with flow matching for robot manipulation. Our evaluation highlights that the proposed prompt tuning method for learning manipulation affordance with language prompter achieves competitive performance. Our framework seamlessly unifies affordance model learning and trajectory generation with flow matching for robot manipulation.
arXiv Detail & Related papers (2024-09-02T09:11:28Z)
RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches [74.300116260004]
Generalization remains one of the most important desiderata for robust robot learning systems. We propose a policy conditioning method using rough trajectory sketches. We show that RT-Trajectory is able to perform a wider range of tasks compared to language-conditioned and goal-conditioned policies.
arXiv Detail & Related papers (2023-11-03T15:31:51Z)
Learning to Act from Actionless Videos through Dense Correspondences [87.1243107115642]
We present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments. Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals. We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks.
arXiv Detail & Related papers (2023-10-12T17:59:23Z)
RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking [54.776890150458385]
We develop an efficient system for training universal agents capable of multi-task manipulation skills. We are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks. On average, RoboAgent outperforms prior methods by over 40% in unseen situations.
arXiv Detail & Related papers (2023-09-05T03:14:39Z)
Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks [65.23947618404046]
We introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data. When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems. We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
arXiv Detail & Related papers (2022-10-12T21:46:38Z)
Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations [25.33452947179541]
We show the effectiveness of object-aware representation learning techniques for robotic tasks. Our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object techniques.
arXiv Detail & Related papers (2022-05-12T19:48:11Z)
Learning Sensorimotor Primitives of Sequential Manipulation Tasks from Visual Demonstrations [13.864448233719598]
This paper describes a new neural network-based framework for learning simultaneously low-level policies and high-level policies. A key feature of the proposed approach is that the policies are learned directly from raw videos of task demonstrations. Empirical results on object manipulation tasks with a robotic arm show that the proposed network can efficiently learn from real visual demonstrations to perform the tasks.
arXiv Detail & Related papers (2022-03-08T01:36:48Z)
COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming. We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task. We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.