Related papers: Understanding Physical Properties of Unseen Deformable Objects by Leveraging Large Language Models and Robot Actions

Understanding Physical Properties of Unseen Deformable Objects by Leveraging Large Language Models and Robot Actions

URL: http://arxiv.org/abs/2506.03760v1
Date: Wed, 04 Jun 2025 09:25:12 GMT
Title: Understanding Physical Properties of Unseen Deformable Objects by Leveraging Large Language Models and Robot Actions
Authors: Changmin Park, Beomjoon Lee, Haechan Jung, Haejin Jung, Changjoo Nam,
Abstract summary: Handling unseen objects with special properties such as deformability is challenging for traditional task and motion planning approaches.<n>Recent results in Large Language Models (LLMs) based task planning have shown the ability to reason about unseen objects.<n>We propose an LLM-based method for probing the physical properties of unseen deformable objects for the purpose of task planning.
Score: 4.606734972599561
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this paper, we consider the problem of understanding the physical properties of unseen objects through interactions between the objects and a robot. Handling unseen objects with special properties such as deformability is challenging for traditional task and motion planning approaches as they are often with the closed world assumption. Recent results in Large Language Models (LLMs) based task planning have shown the ability to reason about unseen objects. However, most studies assume rigid objects, overlooking their physical properties. We propose an LLM-based method for probing the physical properties of unseen deformable objects for the purpose of task planning. For a given set of object properties (e.g., foldability, bendability), our method uses robot actions to determine the properties by interacting with the objects. Based on the properties examined by the LLM and robot actions, the LLM generates a task plan for a specific domain such as object packing. In the experiment, we show that the proposed method can identify properties of deformable objects, which are further used for a bin-packing task where the properties take crucial roles to succeed.

Related papers

Keypoint Abstraction using Large Models for Object-Relative Imitation Learning [78.92043196054071]
Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics. Keypoint-based representations have been proven effective as a succinct representation for essential object capturing features. We propose KALM, a framework that leverages large pre-trained vision-language models to automatically generate task-relevant and cross-instance consistent keypoints.
arXiv Detail & Related papers (2024-10-30T17:37:31Z)
Which objects help me to act effectively? Reasoning about physically-grounded affordances [0.6291443816903801]
A key aspect of this understanding lies in detecting an object's affordances. Our approach leverages a dialogue of large language models (LLMs) and vision-language models (VLMs) to achieve open-world affordance detection. By grounding our system in the physical world, we account for the robot's embodiment and the intrinsic properties of the objects it encounters.
arXiv Detail & Related papers (2024-07-18T11:08:57Z)
Interactive Learning of Physical Object Properties Through Robot Manipulation and Database of Object Measurements [20.301193437161867]
The framework involves exploratory action selection to maximize learning about objects on a table.<n>A robot pipeline integrates with a logging module and an online database of objects, containing over 24,000 measurements of 63 objects with different grippers.
arXiv Detail & Related papers (2024-04-10T20:59:59Z)
ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics [55.85916671269219]
This paper introduces ManiPose, a pioneering benchmark designed to advance the study of pose-varying manipulation tasks. A comprehensive dataset features geometrically consistent and manipulation-oriented 6D pose labels for 2936 real-world scanned rigid objects and 100 articulated objects. Our benchmark demonstrates notable advancements in pose estimation, pose-aware manipulation, and real-robot skill transfer.
arXiv Detail & Related papers (2024-03-20T07:48:32Z)
Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics [5.497036643694402]
We construct a simple simulated environment and demonstrate examples of where, in a zero-shot setting, both text and multimodal LLMs display atomic world knowledge about various objects but fail to compose this knowledge in correct solutions for an object manipulation and placement task. We also use BLIP, a vision-language model trained with more sophisticated cross-modal attention, to identify cases relevant to object physical properties that that model fails to ground.
arXiv Detail & Related papers (2024-02-24T00:01:01Z)
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually. We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions. We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z)
Physically Grounded Vision-Language Models for Robotic Manipulation [59.143640049407104]
We propose PhysObjects, an object-centric dataset of 39.6K crowd-sourced and 417K automated physical concept annotations. We show that fine-tuning a vision-language model on PhysObjects improves its understanding of physical object concepts. We incorporate this physically grounded VLM in an interactive framework with a large language model-based robotic planner.
arXiv Detail & Related papers (2023-09-05T20:21:03Z)
Planning for Learning Object Properties [117.27898922118946]
We formalize the problem of automatically training a neural network to recognize object properties as a symbolic planning problem. We use planning techniques to produce a strategy for automating the training dataset creation and the learning process. We provide an experimental evaluation in both a simulated and a real environment.
arXiv Detail & Related papers (2023-01-15T09:37:55Z)
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering [50.61988087577871]
We introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene. CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning. Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties.
arXiv Detail & Related papers (2022-11-07T18:55:26Z)
O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning [24.9242853417825]
We propose a unified affordance learning framework to learn object-object interaction for various tasks. We are able to conduct large-scale object-object affordance learning without the need for human annotations or demonstrations. Experiments on large-scale synthetic data and real-world data prove the effectiveness of the proposed approach.
arXiv Detail & Related papers (2021-06-29T04:38:12Z)
Object-Driven Active Mapping for More Accurate Object Pose Estimation and Robotic Grasping [5.385583891213281]
The framework is built on an object SLAM system integrated with a simultaneous multi-object pose estimation process. By combining the mapping module and the exploration strategy, an accurate object map that is compatible with robotic grasping can be generated.
arXiv Detail & Related papers (2020-12-03T09:36:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.