Related papers: ShapeGrasp: Zero-Shot Task-Oriented Grasping with Large Language Models through Geometric Decomposition

ShapeGrasp: Zero-Shot Task-Oriented Grasping with Large Language Models through Geometric Decomposition

URL: http://arxiv.org/abs/2403.18062v1
Date: Tue, 26 Mar 2024 19:26:53 GMT
Title: ShapeGrasp: Zero-Shot Task-Oriented Grasping with Large Language Models through Geometric Decomposition
Authors: Samuel Li, Sarthak Bhagat, Joseph Campbell, Yaqi Xie, Woojun Kim, Katia Sycara, Simon Stepputtis,
Abstract summary: Task-oriented grasping of unfamiliar objects is a necessary skill for robots in dynamic in-home environments. We present a novel zero-shot task-oriented grasping method leveraging a geometric decomposition of the target object into simple convex shapes. Our approach employs minimal essential information - the object's name and the intended task - to facilitate zero-shot task-oriented grasping.
Score: 8.654140442734354
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Task-oriented grasping of unfamiliar objects is a necessary skill for robots in dynamic in-home environments. Inspired by the human capability to grasp such objects through intuition about their shape and structure, we present a novel zero-shot task-oriented grasping method leveraging a geometric decomposition of the target object into simple, convex shapes that we represent in a graph structure, including geometric attributes and spatial relationships. Our approach employs minimal essential information - the object's name and the intended task - to facilitate zero-shot task-oriented grasping. We utilize the commonsense reasoning capabilities of large language models to dynamically assign semantic meaning to each decomposed part and subsequently reason over the utility of each part for the intended task. Through extensive experiments on a real-world robotics platform, we demonstrate that our grasping approach's decomposition and reasoning pipeline is capable of selecting the correct part in 92% of the cases and successfully grasping the object in 82% of the tasks we evaluate. Additional videos, experiments, code, and data are available on our project website: https://shapegrasp.github.io/.

Related papers

Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects [59.51185639557874]
We introduce Kinematify, an automated framework that synthesizes articulated objects directly from arbitrary RGB images or textual descriptions.<n>Our method addresses two core challenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static geometry.
arXiv Detail & Related papers (2025-11-03T07:21:42Z)
Weakly-Supervised Learning of Dense Functional Correspondences [23.794395724229762]
We propose a weakly-supervised learning paradigm to tackle the prediction task.<n>The main insight behind our approach is that we can leverage vision-language models to obtain functional parts.<n>We then integrate this with dense contrastive learning from pixel correspondences to distill both functional and spatial knowledge into a new model.
arXiv Detail & Related papers (2025-09-04T05:39:16Z)
SIGHT: Synthesizing Image-Text Conditioned and Geometry-Guided 3D Hand-Object Trajectories [124.24041272390954]
Modeling hand-object interaction priors holds significant potential to advance robotic and embodied AI systems.<n>We introduce SIGHT, a novel task focused on generating realistic and physically plausible 3D hand-object interaction trajectories from a single image.<n>We propose SIGHT-Fusion, a novel diffusion-based image-text conditioned generative model that tackles this task by retrieving the most similar 3D object mesh from a database.
arXiv Detail & Related papers (2025-03-28T20:53:20Z)
Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization. We introduce a benchmark comprising eight different synthetic and real-world datasets. We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z)
ICGNet: A Unified Approach for Instance-Centric Grasping [42.92991092305974]
We introduce an end-to-end architecture for object-centric grasping. We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets.
arXiv Detail & Related papers (2024-01-18T12:41:41Z)
Learning Extrinsic Dexterity with Parameterized Manipulation Primitives [8.7221770019454]
We learn a sequence of actions that utilize the environment to change the object's pose. Our approach can control the object's state through exploiting interactions between the object, the gripper, and the environment. We evaluate our approach on picking box-shaped objects of various weight, shape, and friction properties from a constrained table-top workspace.
arXiv Detail & Related papers (2023-10-26T21:28:23Z)
Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches. We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment. Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z)
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually. We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions. We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z)
Efficient Representations of Object Geometry for Reinforcement Learning of Interactive Grasping Policies [29.998917158604694]
We present a reinforcement learning framework that learns the interactive grasping of various geometrically distinct real-world objects. Videos of learned interactive policies are available at https://maltemosbach.org/io/geometry_aware_grasping_policies.
arXiv Detail & Related papers (2022-11-20T11:47:33Z)
DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools [96.38972082580294]
DiffSkill is a novel framework that uses a differentiable physics simulator for skill abstraction to solve deformable object manipulation tasks. In particular, we first obtain short-horizon skills using individual tools from a gradient-based simulator. We then learn a neural skill abstractor from the demonstration trajectories which takes RGBD images as input.
arXiv Detail & Related papers (2022-03-31T17:59:38Z)
Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation. The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z)
Predicting Stable Configurations for Semantic Placement of Novel Objects [37.18437299513799]
Our goal is to enable robots to repose previously unseen objects according to learned semantic relationships in novel environments. We build our models and training from the ground up to be tightly integrated with our proposed planning algorithm for semantic placement of unknown objects. Our approach enables motion planning for semantic rearrangement of unknown objects in scenes with varying geometry from only RGB-D sensing.
arXiv Detail & Related papers (2021-08-26T23:05:05Z)
O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning [24.9242853417825]
We propose a unified affordance learning framework to learn object-object interaction for various tasks. We are able to conduct large-scale object-object affordance learning without the need for human annotations or demonstrations. Experiments on large-scale synthetic data and real-world data prove the effectiveness of the proposed approach.
arXiv Detail & Related papers (2021-06-29T04:38:12Z)
Where2Act: From Pixels to Actions for Articulated 3D Objects [54.19638599501286]
We extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts. We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation. Our learned models even transfer to real-world data.
arXiv Detail & Related papers (2021-01-07T18:56:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.