Learning from 10 Demos: Generalisable and Sample-Efficient Policy Learning with Oriented Affordance Frames
- URL: http://arxiv.org/abs/2410.12124v2
- Date: Sun, 31 Aug 2025 00:47:29 GMT
- Title: Learning from 10 Demos: Generalisable and Sample-Efficient Policy Learning with Oriented Affordance Frames
- Authors: Krishan Rana, Jad Abou-Chakra, Sourav Garg, Robert Lee, Ian Reid, Niko Suenderhauf,
- Abstract summary: Existing methods require a substantial number of demonstrations to cover possible task variations.<n>We introduce oriented affordance frames, a structured representation for state and action spaces.<n>We show how this abstraction allows for compositional generalisation of independently trained sub-policies.<n>We validate our method across three real-world tasks, each requiring multi-step, multi-object interactions.
- Score: 10.738838923944876
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning has unlocked the potential for robots to exhibit highly dexterous behaviours. However, it still struggles with long-horizon, multi-object tasks due to poor sample efficiency and limited generalisation. Existing methods require a substantial number of demonstrations to cover possible task variations, making them costly and often impractical for real-world deployment. We address this challenge by introducing oriented affordance frames, a structured representation for state and action spaces that improves spatial and intra-category generalisation and enables policies to be learned efficiently from only 10 demonstrations. More importantly, we show how this abstraction allows for compositional generalisation of independently trained sub-policies to solve long-horizon, multi-object tasks. To seamlessly transition between sub-policies, we introduce the notion of self-progress prediction, which we directly derive from the duration of the training demonstrations. We validate our method across three real-world tasks, each requiring multi-step, multi-object interactions. Despite the small dataset, our policies generalise robustly to unseen object appearances, geometries, and spatial arrangements, achieving high success rates without reliance on exhaustive training data. Video demonstration can be found on our project page: https://affordance-policy.github.io/.
Related papers
- Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping [44.348686148716894]
We introduce Tether, a method for autonomous functional play involving structured, task-directed interactions.<n>First, we design a novel open-loop policy that warps actions from a small set of source demonstrations.<n>Second, we deploy this policy for autonomous functional play in the real world via a continuous cycle of task selection, execution, evaluation, and improvement.
arXiv Detail & Related papers (2026-03-03T18:59:07Z) - Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter [26.44450403993957]
We study the task of language-conditioned pick and place in clutter, where a robot should grasp a target object in open clutter and move it to a specified place.
Some approaches learn end-to-end policies with features from vision foundation models, requiring large datasets.
We propose an action prior alignment method that aligns unconditioned action priors with 3D vision-language priors by learning one attention layer.
arXiv Detail & Related papers (2025-03-12T14:20:33Z) - Learning Generalizable 3D Manipulation With 10 Demonstrations [16.502781729164973]
We present a novel framework that learns manipulation skills from as few as 10 demonstrations.
We validate our framework through extensive experiments on both simulation benchmarks and real-world robotic systems.
This work shows significant potential for advancing efficient, generalizable manipulation skill learning in real-world applications.
arXiv Detail & Related papers (2024-11-15T14:01:02Z) - RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation [52.14638923430338]
We propose conditioning policies on affordances, which capture the pose of the robot at key stages of the task.
Our method, RT-Affordance, is a hierarchical model that first proposes an affordance plan given the task language.
We show on a diverse set of novel tasks how RT-Affordance exceeds the performance of existing methods by over 50%.
arXiv Detail & Related papers (2024-11-05T01:02:51Z) - Keypoint Abstraction using Large Models for Object-Relative Imitation Learning [78.92043196054071]
Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics.
Keypoint-based representations have been proven effective as a succinct representation for essential object capturing features.
We propose KALM, a framework that leverages large pre-trained vision-language models to automatically generate task-relevant and cross-instance consistent keypoints.
arXiv Detail & Related papers (2024-10-30T17:37:31Z) - Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies [25.760946763103483]
We propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks.
Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation.
arXiv Detail & Related papers (2024-06-17T17:00:41Z) - Mapping High-level Semantic Regions in Indoor Environments without
Object Recognition [50.624970503498226]
The present work proposes a method for semantic region mapping via embodied navigation in indoor environments.
To enable region identification, the method uses a vision-to-language model to provide scene information for mapping.
By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location.
arXiv Detail & Related papers (2024-03-11T18:09:50Z) - Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools [14.069149456110676]
We introduce a demonstration-free hierarchical planning approach capable of tackling intricate long-horizon tasks.<n>We employ large language models (LLMs) to articulate a high-level, stage-by-stage plan corresponding to a specified task.<n>We further substantiate our approach with experimental trials on real-world robotic platforms.
arXiv Detail & Related papers (2023-11-05T22:43:29Z) - RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory
Sketches [74.300116260004]
Generalization remains one of the most important desiderata for robust robot learning systems.
We propose a policy conditioning method using rough trajectory sketches.
We show that RT-Trajectory is able to perform a wider range of tasks compared to language-conditioned and goal-conditioned policies.
arXiv Detail & Related papers (2023-11-03T15:31:51Z) - Learning Foresightful Dense Visual Affordance for Deformable Object
Manipulation [5.181990978232578]
We study deformable object manipulation using dense visual affordance, with generalization towards diverse states.
We propose a framework for learning this representation, with novel designs such as multi-stage stable learning and efficient self-supervised data collection without experts.
arXiv Detail & Related papers (2023-03-20T12:14:13Z) - Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling.
We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning.
Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z) - Self-Supervised Learning of Multi-Object Keypoints for Robotic
Manipulation [8.939008609565368]
In this paper, we demonstrate the efficacy of learning image keypoints via the Dense Correspondence pretext task for downstream policy learning.
We evaluate our approach on diverse robot manipulation tasks, compare it to other visual representation learning approaches, and demonstrate its flexibility and effectiveness for sample-efficient policy learning.
arXiv Detail & Related papers (2022-05-17T13:15:07Z) - Visuomotor Control in Multi-Object Scenes Using Object-Aware
Representations [25.33452947179541]
We show the effectiveness of object-aware representation learning techniques for robotic tasks.
Our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object techniques.
arXiv Detail & Related papers (2022-05-12T19:48:11Z) - Learning Sensorimotor Primitives of Sequential Manipulation Tasks from
Visual Demonstrations [13.864448233719598]
This paper describes a new neural network-based framework for learning simultaneously low-level policies and high-level policies.
A key feature of the proposed approach is that the policies are learned directly from raw videos of task demonstrations.
Empirical results on object manipulation tasks with a robotic arm show that the proposed network can efficiently learn from real visual demonstrations to perform the tasks.
arXiv Detail & Related papers (2022-03-08T01:36:48Z) - Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task
Learning [108.08083976908195]
We show that policies learned by existing reinforcement learning algorithms can in fact be generalist.
We show that a single generalist policy can perform in-hand manipulation of over 100 geometrically-diverse real-world objects.
Interestingly, we find that multi-task learning with object point cloud representations not only generalizes better but even outperforms single-object specialist policies.
arXiv Detail & Related papers (2021-11-04T17:59:56Z) - Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks.
In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand.
We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z) - Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - Learning Long-term Visual Dynamics with Region Proposal Interaction
Networks [75.06423516419862]
We build object representations that can capture inter-object and object-environment interactions over a long-range.
Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin.
arXiv Detail & Related papers (2020-08-05T17:48:00Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.