Kinematic-aware Prompting for Generalizable Articulated Object
Manipulation with LLMs
- URL: http://arxiv.org/abs/2311.02847v4
- Date: Wed, 21 Feb 2024 02:27:57 GMT
- Title: Kinematic-aware Prompting for Generalizable Articulated Object
Manipulation with LLMs
- Authors: Wenke Xia, Dong Wang, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu,
Xuelong Li
- Abstract summary: Generalizable articulated object manipulation is essential for home-assistant robots.
We propose a kinematic-aware prompting framework that prompts Large Language Models with kinematic knowledge of objects to generate low-level motion waypoints.
Our framework outperforms traditional methods on 8 categories seen and shows a powerful zero-shot capability for 8 unseen articulated object categories.
- Score: 53.66070434419739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalizable articulated object manipulation is essential for home-assistant
robots. Recent efforts focus on imitation learning from demonstrations or
reinforcement learning in simulation, however, due to the prohibitive costs of
real-world data collection and precise object simulation, it still remains
challenging for these works to achieve broad adaptability across diverse
articulated objects. Recently, many works have tried to utilize the strong
in-context learning ability of Large Language Models (LLMs) to achieve
generalizable robotic manipulation, but most of these researches focus on
high-level task planning, sidelining low-level robotic control. In this work,
building on the idea that the kinematic structure of the object determines how
we can manipulate it, we propose a kinematic-aware prompting framework that
prompts LLMs with kinematic knowledge of objects to generate low-level motion
trajectory waypoints, supporting various object manipulation. To effectively
prompt LLMs with the kinematic structure of different objects, we design a
unified kinematic knowledge parser, which represents various articulated
objects as a unified textual description containing kinematic joints and
contact location. Building upon this unified description, a kinematic-aware
planner model is proposed to generate precise 3D manipulation waypoints via a
designed kinematic-aware chain-of-thoughts prompting method. Our evaluation
spanned 48 instances across 16 distinct categories, revealing that our
framework not only outperforms traditional methods on 8 seen categories but
also shows a powerful zero-shot capability for 8 unseen articulated object
categories. Moreover, the real-world experiments on 7 different object
categories prove our framework's adaptability in practical scenarios. Code is
released at
https://github.com/GeWu-Lab/LLM_articulated_object_manipulation/tree/main.
Related papers
- RPMArt: Towards Robust Perception and Manipulation for Articulated Objects [56.73978941406907]
We propose a framework towards Robust Perception and Manipulation for Articulated Objects ( RPMArt)
RPMArt learns to estimate the articulation parameters and manipulate the articulation part from the noisy point cloud.
We introduce an articulation-aware classification scheme to enhance its ability for sim-to-real transfer.
arXiv Detail & Related papers (2024-03-24T05:55:39Z) - GAMMA: Generalizable Articulation Modeling and Manipulation for
Articulated Objects [53.965581080954905]
We propose a novel framework of Generalizable Articulation Modeling and Manipulating for Articulated Objects (GAMMA)
GAMMA learns both articulation modeling and grasp pose affordance from diverse articulated objects with different categories.
Results show that GAMMA significantly outperforms SOTA articulation modeling and manipulation algorithms in unseen and cross-category articulated objects.
arXiv Detail & Related papers (2023-09-28T08:57:14Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects [14.034256001448574]
We propose a vision-based system that learns to predict the potential motions of the parts of a variety of articulated objects.
We deploy an analytical motion planner based on this vector field to achieve a policy that yields maximum articulation.
Results show that our system achieves state-of-the-art performance in both simulated and real-world experiments.
arXiv Detail & Related papers (2022-05-09T15:35:33Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - You Only Demonstrate Once: Category-Level Manipulation from Single
Visual Demonstration [9.245605426105922]
This work proposes a novel, category-level manipulation framework.
It uses an object-centric, category-level representation and model-free 6 DoF motion tracking.
Experiments demonstrate its efficacy in a range of challenging industrial tasks in high-precision assembly.
arXiv Detail & Related papers (2022-01-30T03:59:14Z) - V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated
Objects [51.79035249464852]
We present a framework for learning multi-arm manipulation of articulated objects.
Our framework includes a variational generative model that learns contact point distribution over object rigid parts for each robot arm.
arXiv Detail & Related papers (2021-11-07T02:31:09Z) - VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating
3D ARTiculated Objects [19.296344218177534]
The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality.
Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects.
We propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation.
arXiv Detail & Related papers (2021-06-28T07:47:31Z) - Multi-Modal Learning of Keypoint Predictive Models for Visual Object
Manipulation [6.853826783413853]
Humans have impressive generalization capabilities when it comes to manipulating objects in novel environments.
How to learn such body schemas for robots remains an open problem.
We develop an self-supervised approach that can extend a robot's kinematic model when grasping an object from visual latent representations.
arXiv Detail & Related papers (2020-11-08T01:04:59Z) - "What's This?" -- Learning to Segment Unknown Objects from Manipulation
Sequences [27.915309216800125]
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator.
We propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge.
Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data.
arXiv Detail & Related papers (2020-11-06T10:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.