Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts
- URL: http://arxiv.org/abs/2503.23348v1
- Date: Sun, 30 Mar 2025 08:12:43 GMT
- Title: Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts
- Authors: Jianhua Sun, Jiude Wei, Yuxuan Li, Cewu Lu,
- Abstract summary: We introduce analytic concepts, procedurally defined upon mathematical symbolism, that can be directly computed and simulated by machines.<n>We are able to figure out the knowledge of object structure and functionality with physics-informed representations, and then use the physically grounded knowledge to instruct robot control policies.
- Score: 48.16515416987306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We human rely on a wide range of commonsense knowledge to interact with an extensive number and categories of objects in the physical world. Likewise, such commonsense knowledge is also crucial for robots to successfully develop generalized object manipulation skills. While recent advancements in Large Language Models (LLM) have showcased their impressive capabilities in acquiring commonsense knowledge and conducting commonsense reasoning, effectively grounding this semantic-level knowledge produced by LLMs to the physical world to thoroughly guide robots in generalized articulated object manipulation remains a challenge that has not been sufficiently addressed. To this end, we introduce analytic concepts, procedurally defined upon mathematical symbolism that can be directly computed and simulated by machines. By leveraging the analytic concepts as a bridge between the semantic-level knowledge inferred by LLMs and the physical world where real robots operate, we are able to figure out the knowledge of object structure and functionality with physics-informed representations, and then use the physically grounded knowledge to instruct robot control policies for generalized, interpretable and accurate articulated object manipulation. Extensive experiments in both simulation and real-world environments demonstrate the superiority of our approach.
Related papers
- Digital Gene: Learning about the Physical World through Analytic Concepts [54.21005370169846]
AI systems still struggle when it comes to understanding and interacting with the physical world.
This research introduces the idea of analytic concept.
It provides machine intelligence a portal to perceive, reason about, and interact with the physical world.
arXiv Detail & Related papers (2025-04-05T13:22:11Z) - Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction [52.12746368727368]
Differentiable simulation has become a powerful tool for system identification.
Our approach calibrates object properties by using information from the robot, without relying on data from the object itself.
We demonstrate the effectiveness of our method on a low-cost robotic platform.
arXiv Detail & Related papers (2024-10-04T20:48:38Z) - Discovering Conceptual Knowledge with Analytic Ontology Templates for Articulated Objects [42.9186628100765]
We aim to endow machine intelligence with an analogous capability through performing at the conceptual level.
AOT-driven approach yields benefits in three key perspectives.
arXiv Detail & Related papers (2024-09-18T04:53:38Z) - Human-Object Interaction from Human-Level Instructions [17.10279738828331]
We propose the first complete system for synthesizing human-object interactions for object manipulation in contextual environments.<n>We leverage large language models (LLMs) to interpret the input instructions into detailed execution plans.<n>Unlike prior work, our system is capable of generating detailed finger-object interactions, in seamless coordination with full-body movements.
arXiv Detail & Related papers (2024-06-25T17:46:28Z) - Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality
in Human-Robot Interaction [3.1473798197405953]
This dissertation aims to teach a robot unknown objects in the context of Human-Robot Interaction (HRI)
The combination of eye tracking and Augmented Reality created a powerful synergy that empowered the human teacher to communicate with the robot.
The robot's object detection capabilities exhibited comparable performance to state-of-the-art object detectors trained on extensive datasets.
arXiv Detail & Related papers (2023-12-12T11:34:43Z) - Kinematic-aware Prompting for Generalizable Articulated Object
Manipulation with LLMs [53.66070434419739]
Generalizable articulated object manipulation is essential for home-assistant robots.
We propose a kinematic-aware prompting framework that prompts Large Language Models with kinematic knowledge of objects to generate low-level motion waypoints.
Our framework outperforms traditional methods on 8 categories seen and shows a powerful zero-shot capability for 8 unseen articulated object categories.
arXiv Detail & Related papers (2023-11-06T03:26:41Z) - Penetrative AI: Making LLMs Comprehend the Physical World [3.0266193917041306]
Large Language Models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
This paper explores how LLMs can be extended to interact with and reason about the physical world through IoT sensors and actuators.
arXiv Detail & Related papers (2023-10-14T15:48:15Z) - Physically Grounded Vision-Language Models for Robotic Manipulation [59.143640049407104]
We propose PhysObjects, an object-centric dataset of 39.6K crowd-sourced and 417K automated physical concept annotations.
We show that fine-tuning a vision-language model on PhysObjects improves its understanding of physical object concepts.
We incorporate this physically grounded VLM in an interactive framework with a large language model-based robotic planner.
arXiv Detail & Related papers (2023-09-05T20:21:03Z) - Fit to Measure: Reasoning about Sizes for Robust Object Recognition [0.5352699766206808]
We present an approach to integrating knowledge about object sizes in a ML based architecture.
Our experiments in a real world robotic scenario show that this combined approach ensures a significant performance increase over state of the art Machine Learning methods.
arXiv Detail & Related papers (2020-10-27T13:54:37Z) - A Review on Intelligent Object Perception Methods Combining
Knowledge-based Reasoning and Machine Learning [60.335974351919816]
Object perception is a fundamental sub-field of Computer Vision.
Recent works seek ways to integrate knowledge engineering in order to expand the level of intelligence of the visual interpretation of objects.
arXiv Detail & Related papers (2019-12-26T13:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.