IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models
- URL: http://arxiv.org/abs/2503.10110v1
- Date: Thu, 13 Mar 2025 07:09:00 GMT
- Title: IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models
- Authors: Yiyang Ling, Karan Owalekar, Oluwatobiloba Adesanya, Erdem Bıyık, Daniel Seita,
- Abstract summary: We propose IMPACT, a novel motion planning framework that uses Vision-Language Models (VLMs) to infer environment semantics.<n>We perform experiments using 20 simulation and 10 real-world scenes and assess using task success rate, object displacements, and feedback from human evaluators.<n>Our results over 3620 simulation and 200 real-world trials suggest that IMPACT enables efficient contact-rich motion planning in cluttered settings.
- Score: 2.889915951061306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motion planning involves determining a sequence of robot configurations to reach a desired pose, subject to movement and safety constraints. Traditional motion planning finds collision-free paths, but this is overly restrictive in clutter, where it may not be possible for a robot to accomplish a task without contact. In addition, contacts range from relatively benign (e.g., brushing a soft pillow) to more dangerous (e.g., toppling a glass vase). Due to this diversity, it is difficult to characterize which contacts may be acceptable or unacceptable. In this paper, we propose IMPACT, a novel motion planning framework that uses Vision-Language Models (VLMs) to infer environment semantics, identifying which parts of the environment can best tolerate contact based on object properties and locations. Our approach uses the VLM's outputs to produce a dense 3D "cost map" that encodes contact tolerances and seamlessly integrates with standard motion planners. We perform experiments using 20 simulation and 10 real-world scenes and assess using task success rate, object displacements, and feedback from human evaluators. Our results over 3620 simulation and 200 real-world trials suggest that IMPACT enables efficient contact-rich motion planning in cluttered settings while outperforming alternative methods and ablations. Supplementary material is available at https://impact-planning.github.io/.
Related papers
- MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior [51.672193627686]
MotionCom is a training-free motion-aware diffusion based image composition.
It enables seamless integration of target objects into new scenes with dynamically coherent results.
arXiv Detail & Related papers (2024-09-16T08:44:17Z) - A Meta-Engine Framework for Interleaved Task and Motion Planning using Topological Refinements [51.54559117314768]
Task And Motion Planning (TAMP) is the problem of finding a solution to an automated planning problem.
We propose a general and open-source framework for modeling and benchmarking TAMP problems.
We introduce an innovative meta-technique to solve TAMP problems involving moving agents and multiple task-state-dependent obstacles.
arXiv Detail & Related papers (2024-08-11T14:57:57Z) - Controllable Human-Object Interaction Synthesis [77.56877961681462]
We propose Controllable Human-Object Interaction Synthesis (CHOIS) to generate synchronized object motion and human motion in 3D scenes.
Here, language descriptions inform style and intent, and waypoints, which can be effectively extracted from high-level planning, ground the motion in the scene.
Our module seamlessly integrates with a path planning module, enabling the generation of long-term interactions in 3D environments.
arXiv Detail & Related papers (2023-12-06T21:14:20Z) - Language-Conditioned Path Planning [68.13248140217222]
Language-Conditioned Collision Functions (LACO) learns a collision function using only a single-view image, language prompt, and robot configuration.
LACO predicts collisions between the robot and the environment, enabling flexible, conditional path planning without the need for object annotations, point cloud data, or ground-truth object meshes.
In both simulation and the real world, we demonstrate that LACO can facilitate complex, nuanced path plans that allow for interaction with objects that are safe to collide, rather than prohibiting any collision.
arXiv Detail & Related papers (2023-08-31T17:56:13Z) - VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
Language Models [38.503337052122234]
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation.
We aim to synthesize robot trajectories for a variety of manipulation tasks given an open-set of instructions and an open-set of objects.
We demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions.
arXiv Detail & Related papers (2023-07-12T07:40:48Z) - QuestEnvSim: Environment-Aware Simulated Motion Tracking from Sparse
Sensors [69.75711933065378]
We show that headset and controller pose can generate realistic full-body poses even in highly constrained environments.
We discuss three features, the environment representation, the contact reward and scene randomization, crucial to the performance of the method.
arXiv Detail & Related papers (2023-06-09T04:40:38Z) - Synthesizing Diverse Human Motions in 3D Indoor Scenes [16.948649870341782]
We present a novel method for populating 3D indoor scenes with virtual humans that can navigate in the environment and interact with objects in a realistic manner.
Existing approaches rely on training sequences that contain captured human motions and the 3D scenes they interact with.
We propose a reinforcement learning-based approach that enables virtual humans to navigate in 3D scenes and interact with objects realistically and autonomously.
arXiv Detail & Related papers (2023-05-21T09:22:24Z) - Robot Active Neural Sensing and Planning in Unknown Cluttered
Environments [0.0]
Active sensing and planning in unknown, cluttered environments is an open challenge for robots intending to provide home service, search and rescue, narrow-passage inspection, and medical assistance.
We present the active neural sensing approach that generates the kinematically feasible viewpoint sequences for the robot manipulator with an in-hand camera to gather the minimum number of observations needed to reconstruct the underlying environment.
Our framework actively collects the visual RGBD observations, aggregates them into scene representation, and performs object shape inference to avoid unnecessary robot interactions with the environment.
arXiv Detail & Related papers (2022-08-23T16:56:54Z) - Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform.
We produce a closed-loop controller to reactively push objects in a continuous action space.
We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.