Related papers: DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation

DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation

URL: http://arxiv.org/abs/2510.05662v1
Date: Tue, 07 Oct 2025 08:18:29 GMT
Title: DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation
Authors: Taeyeop Lee, Gyuree Kang, Bowen Wen, Youngho Kim, Seunghyeok Back, In So Kweon, David Hyunchul Shim, Kuk-Jin Yoon,
Abstract summary: DeLTa is a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural task instructions.<n>A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training.
Score: 85.60798754284006
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the prevalence of transparent object interactions in human everyday life, transparent robotic manipulation research remains limited to short-horizon tasks and basic grasping capabilities.Although some methods have partially addressed these issues, most of them have limitations in generalizability to novel objects and are insufficient for precise long-horizon robot manipulation. To address this limitation, we propose DeLTa (Demonstration and Language-Guided Novel Transparent Object Manipulation), a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural task instructions. A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training. Additionally, we present a task planner that refines the VLM-generated plan to account for the constraints of a single-arm, eye-in-hand robot for long-horizon object manipulation tasks. Through comprehensive evaluation, we demonstrate that our method significantly outperforms existing transparent object manipulation approaches, particularly in long-horizon scenarios requiring precise manipulation capabilities. Project page: https://sites.google.com/view/DeLTa25/

Related papers

Gondola: Grounded Vision Language Planning for Generalizable Robotic Manipulation [62.711546725154314]
We introduce Gondola, a grounded vision-language planning model based on large language models (LLMs) for generalizable robotic manipulation.<n>G Gondola takes multi-view images and history plans to produce the next action plan with interleaved texts and segmentation masks of target objects and locations.<n>G Gondola outperforms the state-of-the-art LLM-based method across all four levels of the GemBench dataset.
arXiv Detail & Related papers (2025-06-12T20:04:31Z)
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration [10.558622685760346]
We present a simple yet effective approach for achieving object generalization through Vision-Language-Action models.<n>Our method provides a lightweight and scalable way to inject knowledge about the target object.<n>We evaluate ObjectVLA on a real robotic platform, demonstrating its ability to generalize across 100 novel objects with a 64% success rate.
arXiv Detail & Related papers (2025-02-26T15:56:36Z)
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [62.854649499866774]
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation.<n>We propose a novel multi-agent LLM framework that distributes high-level planning and low-level control code generation across specialized LLM agents.<n>We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting.
arXiv Detail & Related papers (2024-11-26T17:53:44Z)
NOD-TAMP: Generalizable Long-Horizon Planning with Neural Object Descriptors [16.475094344344512]
We propose to combine two paradigms: Neural Object Descriptors (NODs) that produce generalizable object-centric features and Task and Motion Planning (TAMP) frameworks that chain short-horizon skills to solve multi-step tasks. We introduce NOD-TAMP, a TAMP-based framework that extracts short manipulation trajectories from a handful of human demonstrations, adapts these trajectories using NOD features, and composes them to solve broad long-horizon, contact-rich tasks.
arXiv Detail & Related papers (2023-11-02T18:26:28Z)
Universal Visual Decomposer: Long-Horizon Manipulation Made Easy [54.93745986073738]
Real-world robotic tasks stretch over extended horizons and encompass multiple stages. Prior task decomposition methods require task-specific knowledge, are computationally intensive, and cannot readily be applied to new tasks. We propose Universal Visual Decomposer (UVD), an off-the-shelf task decomposition method for visual long horizon manipulation. We extensively evaluate UVD on both simulation and real-world tasks, and in all cases, UVD substantially outperforms baselines across imitation and reinforcement learning settings.
arXiv Detail & Related papers (2023-10-12T17:59:41Z)
Generalizable Long-Horizon Manipulations with Large Language Models [91.740084601715]
This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations. We create a challenging robotic manipulation task suite based on Pybullet for long-horizon task evaluation.
arXiv Detail & Related papers (2023-10-03T17:59:46Z)
Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation [64.00292856805865]
We propose PlAnning with Spatial-Temporal Abstraction (PASTA), which incorporates both spatial abstraction and temporal abstraction. Our framework maps high-dimension 3D observations into a set of latent vectors and plans over skill sequences on top of the latent set representation. We show that our method can effectively perform challenging deformable object manipulation tasks in the real world.
arXiv Detail & Related papers (2022-10-27T19:57:04Z)
ManipulaTHOR: A Framework for Visual Object Manipulation [27.17908758246059]
We propose a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework. This task extends the popular point navigation task to object manipulation and offers new challenges including 3D obstacle avoidance.
arXiv Detail & Related papers (2021-04-22T17:49:04Z)
A Long Horizon Planning Framework for Manipulating Rigid Pointcloud Objects [25.428781562909606]
We present a framework for solving long-horizon planning problems involving manipulation of rigid objects. Our method plans in the space of object subgoals and frees the planner from reasoning about robot-object interaction dynamics.
arXiv Detail & Related papers (2020-11-16T18:59:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.