Related papers: Prompt2Auto: From Motion Prompt to Automated Control via Geometry-Invariant One-Shot Gaussian Process Learning

Prompt2Auto: From Motion Prompt to Automated Control via Geometry-Invariant One-Shot Gaussian Process Learning

URL: http://arxiv.org/abs/2509.14040v1
Date: Wed, 17 Sep 2025 14:42:18 GMT
Title: Prompt2Auto: From Motion Prompt to Automated Control via Geometry-Invariant One-Shot Gaussian Process Learning
Authors: Zewen Yang, Xiaobing Dai, Dongfa Zhang, Yu Li, Ziyang Meng, Bingkun Huang, Hamid Sadeghian, Sami Haddadin,
Abstract summary: We propose a geometry-invariant one-shot Gaussian process (GeoGP) learning framework that enables robots to perform human-guided automated control from a single motion prompt.<n>GeoGP is robust to variations in the user's motion prompt and supports multi-skill autonomy.<n>We validate the proposed approach through numerical simulations with the designed user graphical interface and two real-world robotic experiments.
Score: 25.206607838809887
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning from demonstration allows robots to acquire complex skills from human demonstrations, but conventional approaches often require large datasets and fail to generalize across coordinate transformations. In this paper, we propose Prompt2Auto, a geometry-invariant one-shot Gaussian process (GeoGP) learning framework that enables robots to perform human-guided automated control from a single motion prompt. A dataset-construction strategy based on coordinate transformations is introduced that enforces invariance to translation, rotation, and scaling, while supporting multi-step predictions. Moreover, GeoGP is robust to variations in the user's motion prompt and supports multi-skill autonomy. We validate the proposed approach through numerical simulations with the designed user graphical interface and two real-world robotic experiments, which demonstrate that the proposed method is effective, generalizes across tasks, and significantly reduces the demonstration burden. Project page is available at: https://prompt2auto.github.io

Related papers

Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation [90.90219129619344]
This paper presents a novel R-prior-S, Recurrent Geometric-priormodal Policy with Spiking features.<n>To ground high-level reasoning in physical reality, we leverage lightweight 2D geometric inductive biases.<n>For the data efficiency issue in robotic action generation, we introduce a Recursive Adaptive Spiking Network.
arXiv Detail & Related papers (2026-01-13T23:36:30Z)
Learning Pivoting Manipulation with Force and Vision Feedback Using Optimization-based Demonstrations [20.20969802675097]
We propose a framework for learning closed-loop pivoting manipulation.<n>By leveraging computationally efficient Contact-Implicit Trajectory Optimization, we design demonstration-guided deep Reinforcement Learning.<n>We also present a sim-to-real transfer approach using a privileged training strategy, enabling the robot to perform pivoting manipulation.
arXiv Detail & Related papers (2025-08-01T21:33:46Z)
Trajectory Adaptation using Large Language Models [0.8704964543257245]
Adapting robot trajectories based on human instructions as per new situations is essential for achieving more intuitive and scalable human-robot interactions.<n>This work proposes a flexible language-based framework to adapt generic robotic trajectories produced by off-the-shelf motion planners.<n>We utilize pre-trained LLMs to adapt trajectory waypoints by generating code as a policy for dense robot manipulation.
arXiv Detail & Related papers (2025-04-17T08:48:23Z)
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up.<n>It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention.<n>It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z)
Scaling Manipulation Learning with Visual Kinematic Chain Prediction [32.99644520625179]
We propose the visual kinematics chain as a precise and universal representation of quasi-static actions for robot learning over diverse environments. We demonstrate the superior performance of VKT over BC transformers as a general agent on Calvin, RLBench, Open-X, and real robot manipulation tasks.
arXiv Detail & Related papers (2024-06-12T03:10:27Z)
Guided Decoding for Robot On-line Motion Generation and Adaption [44.959409835754634]
We present a novel motion generation approach for robot arms, with high degrees of freedom, in complex settings that can adapt online to obstacles or new via points. We train a transformer architecture, based on conditional variational autoencoder, on a large dataset of simulated trajectories used as demonstrations. We show that our model successfully generates motion from different initial and target points and that is capable of generating trajectories that navigate complex tasks across different robotic platforms.
arXiv Detail & Related papers (2024-03-22T14:32:27Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.<n>Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.<n>Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z)
VIMA: General Robot Manipulation with Multimodal Prompts [82.01214865117637]
We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts. We develop a new simulation benchmark that consists of thousands of procedurally-generated tabletop tasks. We design a transformer-based robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively.
arXiv Detail & Related papers (2022-10-06T17:50:11Z)
Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query. Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories. We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.