RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for
Robotic Manipulation
- URL: http://arxiv.org/abs/2402.15487v1
- Date: Fri, 23 Feb 2024 18:27:17 GMT
- Title: RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for
Robotic Manipulation
- Authors: Hanxiao Jiang, Binghao Huang, Ruihai Wu, Zhuoran Li, Shubham Garg,
Hooshang Nayyeri, Shenlong Wang, Yunzhu Li
- Abstract summary: We introduce the novel task of interactive scene exploration, wherein robots autonomously explore environments and produce an action-conditioned scene graph (ACSG)
The ACSG accounts for both low-level information, such as geometry and semantics, and high-level information, such as the action-conditioned relationships between different entities in the scene.
We apply our system across various real-world settings in a zero-shot manner, demonstrating its effectiveness in exploring and modeling environments it has never seen before.
- Score: 22.30830950219317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robots need to explore their surroundings to adapt to and tackle tasks in
unknown environments. Prior work has proposed building scene graphs of the
environment but typically assumes that the environment is static, omitting
regions that require active interactions. This severely limits their ability to
handle more complex tasks in household and office environments: before setting
up a table, robots must explore drawers and cabinets to locate all utensils and
condiments. In this work, we introduce the novel task of interactive scene
exploration, wherein robots autonomously explore environments and produce an
action-conditioned scene graph (ACSG) that captures the structure of the
underlying environment. The ACSG accounts for both low-level information, such
as geometry and semantics, and high-level information, such as the
action-conditioned relationships between different entities in the scene. To
this end, we present the Robotic Exploration (RoboEXP) system, which
incorporates the Large Multimodal Model (LMM) and an explicit memory design to
enhance our system's capabilities. The robot reasons about what and how to
explore an object, accumulating new information through the interaction process
and incrementally constructing the ACSG. We apply our system across various
real-world settings in a zero-shot manner, demonstrating its effectiveness in
exploring and modeling environments it has never seen before. Leveraging the
constructed ACSG, we illustrate the effectiveness and efficiency of our RoboEXP
system in facilitating a wide range of real-world manipulation tasks involving
rigid, articulated objects, nested objects like Matryoshka dolls, and
deformable objects like cloth.
Related papers
- GRUtopia: Dream General Robots in a City at Scale [65.08318324604116]
This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots.
GRScenes includes 100k interactive, finely annotated scenes, which can be freely combined into city-scale environments.
GRResidents is a Large Language Model (LLM) driven Non-Player Character (NPC) system that is responsible for social interaction.
arXiv Detail & Related papers (2024-07-15T17:40:46Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Enhancing Graph Representation of the Environment through Local and
Cloud Computation [2.9465623430708905]
We propose a graph-based representation that provides a semantic representation of robot environments from multiple sources.
To acquire information from the environment, the framework combines classical computer vision tools with modern computer vision cloud services.
The proposed approach allows us to handle also small objects and integrate them into the semantic representation of the environment.
arXiv Detail & Related papers (2023-09-22T08:05:32Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - Learning Hierarchical Interactive Multi-Object Search for Mobile
Manipulation [10.21450780640562]
We introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects.
These new challenges require combining manipulation and navigation skills in unexplored environments.
We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills.
arXiv Detail & Related papers (2023-07-12T12:25:33Z) - FOCUS: Object-Centric World Models for Robotics Manipulation [4.6956495676681484]
FOCUS is a model-based agent that learns an object-centric world model.
We show that object-centric world models allow the agent to solve tasks more efficiently.
We also showcase how FOCUS could be adopted in real-world settings.
arXiv Detail & Related papers (2023-07-05T16:49:06Z) - ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains.
The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK)
We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z) - Robot Active Neural Sensing and Planning in Unknown Cluttered
Environments [0.0]
Active sensing and planning in unknown, cluttered environments is an open challenge for robots intending to provide home service, search and rescue, narrow-passage inspection, and medical assistance.
We present the active neural sensing approach that generates the kinematically feasible viewpoint sequences for the robot manipulator with an in-hand camera to gather the minimum number of observations needed to reconstruct the underlying environment.
Our framework actively collects the visual RGBD observations, aggregates them into scene representation, and performs object shape inference to avoid unnecessary robot interactions with the environment.
arXiv Detail & Related papers (2022-08-23T16:56:54Z) - SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects.
We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.