FOCUS: Object-Centric World Models for Robotics Manipulation
- URL: http://arxiv.org/abs/2307.02427v2
- Date: Fri, 7 Jul 2023 13:36:35 GMT
- Title: FOCUS: Object-Centric World Models for Robotics Manipulation
- Authors: Stefano Ferraro, Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt
- Abstract summary: FOCUS is a model-based agent that learns an object-centric world model.
We show that object-centric world models allow the agent to solve tasks more efficiently.
We also showcase how FOCUS could be adopted in real-world settings.
- Score: 4.6956495676681484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the world in terms of objects and the possible interplays with
them is an important cognition ability, especially in robotics manipulation,
where many tasks require robot-object interactions. However, learning such a
structured world model, which specifically captures entities and relationships,
remains a challenging and underexplored problem. To address this, we propose
FOCUS, a model-based agent that learns an object-centric world model. Thanks to
a novel exploration bonus that stems from the object-centric representation,
FOCUS can be deployed on robotics manipulation tasks to explore object
interactions more easily. Evaluating our approach on manipulation tasks across
different settings, we show that object-centric world models allow the agent to
solve tasks more efficiently and enable consistent exploration of robot-object
interactions. Using a Franka Emika robot arm, we also showcase how FOCUS could
be adopted in real-world settings.
Related papers
- Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models [53.22792173053473]
We introduce an interactive robotic manipulation framework called Polaris.
Polaris integrates perception and interaction by utilizing GPT-4 alongside grounded vision models.
We propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline.
arXiv Detail & Related papers (2024-08-15T06:40:38Z) - ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and Robots [24.035706461949715]
There is a pressing need to develop a model that enables general-purpose robots to undertake a broad spectrum of manipulation tasks.
Our work introduces a comprehensive framework to develop a foundation model for general robotic manipulation.
Our model achieves average success rates of around 90%.
arXiv Detail & Related papers (2024-05-11T09:18:37Z) - Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality
in Human-Robot Interaction [3.1473798197405953]
This dissertation aims to teach a robot unknown objects in the context of Human-Robot Interaction (HRI)
The combination of eye tracking and Augmented Reality created a powerful synergy that empowered the human teacher to communicate with the robot.
The robot's object detection capabilities exhibited comparable performance to state-of-the-art object detectors trained on extensive datasets.
arXiv Detail & Related papers (2023-12-12T11:34:43Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z) - DexTransfer: Real World Multi-fingered Dexterous Grasping with Minimal
Human Demonstrations [51.87067543670535]
We propose a robot-learning system that can take a small number of human demonstrations and learn to grasp unseen object poses.
We train a dexterous grasping policy that takes the point clouds of the object as input and predicts continuous actions to grasp objects from different initial robot states.
The policy learned from our dataset can generalize well on unseen object poses in both simulation and the real world.
arXiv Detail & Related papers (2022-09-28T17:51:49Z) - Curious Exploration via Structured World Models Yields Zero-Shot Object
Manipulation [19.840186443344]
We propose to use structured world models to incorporate inductive biases in the control loop to achieve sample-efficient exploration.
Our method generates free-play behavior that starts to interact with objects early on and develops more complex behavior over time.
arXiv Detail & Related papers (2022-06-22T22:08:50Z) - V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated
Objects [51.79035249464852]
We present a framework for learning multi-arm manipulation of articulated objects.
Our framework includes a variational generative model that learns contact point distribution over object rigid parts for each robot arm.
arXiv Detail & Related papers (2021-11-07T02:31:09Z) - Maintaining a Reliable World Model using Action-aware Perceptual
Anchoring [4.971403153199917]
There is a need for robots to maintain a model of its surroundings even when objects go out of view and are no longer visible.
This requires anchoring perceptual information onto symbols that represent the objects in the environment.
We present a model for action-aware perceptual anchoring that enables robots to track objects in a persistent manner.
arXiv Detail & Related papers (2021-07-07T06:35:14Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.