S$^2$-Diffusion: Generalizing from Instance-level to Category-level Skills in Robot Manipulation
- URL: http://arxiv.org/abs/2502.09389v2
- Date: Mon, 17 Feb 2025 08:38:28 GMT
- Title: S$^2$-Diffusion: Generalizing from Instance-level to Category-level Skills in Robot Manipulation
- Authors: Quantao Yang, Michael C. Welle, Danica Kragic, Olov Andersson,
- Abstract summary: We present an open-vocabulary Spatial-Semantic Diffusion policy (S$2$-Diffusion) which enables generalization from instance-level training data to category-level.
We show that functional aspects of skills can be captured via a promptable semantic module combined with a spatial representation.
Our results show that S$2$-Diffusion is invariant to changes in category-irrelevant factors as well as enables satisfying performance on other instances within the same category, even if it was not trained on that specific instance.
- Score: 14.36036106689291
- License:
- Abstract: Recent advances in skill learning has propelled robot manipulation to new heights by enabling it to learn complex manipulation tasks from a practical number of demonstrations. However, these skills are often limited to the particular action, object, and environment \textit{instances} that are shown in the training data, and have trouble transferring to other instances of the same category. In this work we present an open-vocabulary Spatial-Semantic Diffusion policy (S$^2$-Diffusion) which enables generalization from instance-level training data to category-level, enabling skills to be transferable between instances of the same category. We show that functional aspects of skills can be captured via a promptable semantic module combined with a spatial representation. We further propose leveraging depth estimation networks to allow the use of only a single RGB camera. Our approach is evaluated and compared on a diverse number of robot manipulation tasks, both in simulation and in the real world. Our results show that S$^2$-Diffusion is invariant to changes in category-irrelevant factors as well as enables satisfying performance on other instances within the same category, even if it was not trained on that specific instance. Full videos of all real-world experiments are available in the supplementary material.
Related papers
- A Pattern Language for Machine Learning Tasks [0.0]
We view objective functions as constraints on the behaviour of learners.
We develop a formal graphical language that allows us to separate the core tasks of a behaviour from its implementation details.
As proof-of-concept, we design a novel task that enables converting classifiers into generative models we call "manipulators"
arXiv Detail & Related papers (2024-07-02T16:50:27Z) - Learning Generalizable Feature Fields for Mobile Manipulation [25.155275186849558]
We present GeFF, a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time.
We quantitatively evaluate GeFF's ability for open-vocabulary object-/part-level manipulation and show that GeFF outperforms point-based baselines in runtime and storage-accuracy trade-offs.
arXiv Detail & Related papers (2024-03-12T11:51:55Z) - Learning Reusable Manipulation Strategies [86.07442931141634]
Humans demonstrate an impressive ability to acquire and generalize manipulation "tricks"
We present a framework that enables machines to acquire such manipulation skills through a single demonstration and self-play.
These learned mechanisms and samplers can be seamlessly integrated into standard task and motion planners.
arXiv Detail & Related papers (2023-11-06T17:35:42Z) - Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories
of Articulated Objects [15.989258402792755]
'Where2Explore' is a framework that effectively explores novel categories with minimal interactions on a limited number of instances.
Our framework explicitly estimates the geometric similarity across different categories, identifying local areas that differ from shapes in the training categories for efficient exploration.
arXiv Detail & Related papers (2023-09-14T07:11:58Z) - Universal Instance Perception as Object Discovery and Retrieval [90.96031157557806]
UNI reformulates diverse instance perception tasks into a unified object discovery and retrieval paradigm.
It can flexibly perceive different types of objects by simply changing the input prompts.
UNI shows superior performance on 20 challenging benchmarks from 10 instance-level tasks.
arXiv Detail & Related papers (2023-03-12T14:28:24Z) - Visual Transformer for Task-aware Active Learning [49.903358393660724]
We present a novel pipeline for pool-based Active Learning.
Our method exploits accessible unlabelled examples during training to estimate their co-relation with the labelled examples.
Visual Transformer models non-local visual concept dependency between labelled and unlabelled examples.
arXiv Detail & Related papers (2021-06-07T17:13:59Z) - Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks.
In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand.
We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - Transformers for One-Shot Visual Imitation [28.69615089950047]
Humans are able to seamlessly visually imitate others, by inferring their intentions and using past experience to achieve the same end goal.
Prior research in robot imitation learning has created agents which can acquire diverse skills from expert human operators.
This paper investigates techniques which allow robots to partially bridge these domain gaps, using their past experience.
arXiv Detail & Related papers (2020-11-11T18:41:07Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.