Related papers: Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

URL: http://arxiv.org/abs/2311.02847v4
Date: Wed, 21 Feb 2024 02:27:57 GMT
Title: Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs
Authors: Wenke Xia, Dong Wang, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li
Abstract summary: Generalizable articulated object manipulation is essential for home-assistant robots. We propose a kinematic-aware prompting framework that prompts Large Language Models with kinematic knowledge of objects to generate low-level motion waypoints. Our framework outperforms traditional methods on 8 categories seen and shows a powerful zero-shot capability for 8 unseen articulated object categories.
Score: 53.66070434419739
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generalizable articulated object manipulation is essential for home-assistant robots. Recent efforts focus on imitation learning from demonstrations or reinforcement learning in simulation, however, due to the prohibitive costs of real-world data collection and precise object simulation, it still remains challenging for these works to achieve broad adaptability across diverse articulated objects. Recently, many works have tried to utilize the strong in-context learning ability of Large Language Models (LLMs) to achieve generalizable robotic manipulation, but most of these researches focus on high-level task planning, sidelining low-level robotic control. In this work, building on the idea that the kinematic structure of the object determines how we can manipulate it, we propose a kinematic-aware prompting framework that prompts LLMs with kinematic knowledge of objects to generate low-level motion trajectory waypoints, supporting various object manipulation. To effectively prompt LLMs with the kinematic structure of different objects, we design a unified kinematic knowledge parser, which represents various articulated objects as a unified textual description containing kinematic joints and contact location. Building upon this unified description, a kinematic-aware planner model is proposed to generate precise 3D manipulation waypoints via a designed kinematic-aware chain-of-thoughts prompting method. Our evaluation spanned 48 instances across 16 distinct categories, revealing that our framework not only outperforms traditional methods on 8 seen categories but also shows a powerful zero-shot capability for 8 unseen articulated object categories. Moreover, the real-world experiments on 7 different object categories prove our framework's adaptability in practical scenarios. Code is released at https://github.com/GeWu-Lab/LLM_articulated_object_manipulation/tree/main.

Related papers

Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects [59.51185639557874]
We introduce Kinematify, an automated framework that synthesizes articulated objects directly from arbitrary RGB images or textual descriptions.<n>Our method addresses two core challenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static geometry.
arXiv Detail & Related papers (2025-11-03T07:21:42Z)
ObjectMover: Generative Object Movement with Video Prior [69.75281888309017]
We present ObjectMover, a generative model that can perform object movement in challenging scenes. We show that with this approach, our model is able to adjust to complex real-world scenarios. We propose a multi-task learning strategy that enables training on real-world video data to improve the model generalization.
arXiv Detail & Related papers (2025-03-11T04:42:59Z)
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration [10.558622685760346]
We present a simple yet effective approach for achieving object generalization through Vision-Language-Action models. Our method provides a lightweight and scalable way to inject knowledge about the target object. We evaluate ObjectVLA on a real robotic platform, demonstrating its ability to generalize across 100 novel objects with a 64% success rate.
arXiv Detail & Related papers (2025-02-26T15:56:36Z)
Efficient Object-centric Representation Learning with Pre-trained Geometric Prior [1.9685736810241874]
We propose a weakly-supervised framework that emphasises geometric understanding and leverages pre-trained vision models to enhance object discovery. Our method introduces an efficient slot decoder specifically designed for object-centric learning, enabling effective representation of multi-object scenes without requiring explicit depth information.
arXiv Detail & Related papers (2024-12-16T20:01:35Z)
RPMArt: Towards Robust Perception and Manipulation for Articulated Objects [56.73978941406907]
We propose a framework towards Robust Perception and Manipulation for Articulated Objects ( RPMArt) RPMArt learns to estimate the articulation parameters and manipulate the articulation part from the noisy point cloud. We introduce an articulation-aware classification scheme to enhance its ability for sim-to-real transfer.
arXiv Detail & Related papers (2024-03-24T05:55:39Z)
GAMMA: Generalizable Articulation Modeling and Manipulation for Articulated Objects [53.965581080954905]
We propose a novel framework of Generalizable Articulation Modeling and Manipulating for Articulated Objects (GAMMA) GAMMA learns both articulation modeling and grasp pose affordance from diverse articulated objects with different categories. Results show that GAMMA significantly outperforms SOTA articulation modeling and manipulation algorithms in unseen and cross-category articulated objects.
arXiv Detail & Related papers (2023-09-28T08:57:14Z)
ROAM: Robust and Object-Aware Motion Generation Using Neural Pose Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object. We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object. We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z)
FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects [14.034256001448574]
We propose a vision-based system that learns to predict the potential motions of the parts of a variety of articulated objects. We deploy an analytical motion planner based on this vector field to achieve a policy that yields maximum articulation. Results show that our system achieves state-of-the-art performance in both simulated and real-world experiments.
arXiv Detail & Related papers (2022-05-09T15:35:33Z)
Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration [9.245605426105922]
This work proposes a novel, category-level manipulation framework. It uses an object-centric, category-level representation and model-free 6 DoF motion tracking. Experiments demonstrate its efficacy in a range of challenging industrial tasks in high-precision assembly.
arXiv Detail & Related papers (2022-01-30T03:59:14Z)
V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated Objects [51.79035249464852]
We present a framework for learning multi-arm manipulation of articulated objects. Our framework includes a variational generative model that learns contact point distribution over object rigid parts for each robot arm.
arXiv Detail & Related papers (2021-11-07T02:31:09Z)
VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects [19.296344218177534]
The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality. Previous works mostly abstract kinematic structure with estimated joint parameters and part poses as the visual representations for manipulating 3D articulated objects. We propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation.
arXiv Detail & Related papers (2021-06-28T07:47:31Z)
Multi-Modal Learning of Keypoint Predictive Models for Visual Object Manipulation [6.853826783413853]
Humans have impressive generalization capabilities when it comes to manipulating objects in novel environments. How to learn such body schemas for robots remains an open problem. We develop an self-supervised approach that can extend a robot's kinematic model when grasping an object from visual latent representations.
arXiv Detail & Related papers (2020-11-08T01:04:59Z)
"What's This?" -- Learning to Segment Unknown Objects from Manipulation Sequences [27.915309216800125]
We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator. We propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge. Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data.
arXiv Detail & Related papers (2020-11-06T10:55:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.