TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control
- URL: http://arxiv.org/abs/2602.07439v1
- Date: Sat, 07 Feb 2026 08:42:11 GMT
- Title: TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control
- Authors: Weiji Xie, Jiakun Zheng, Jinrui Han, Jiyuan Shi, Weinan Zhang, Chenjia Bai, Xuelong Li,
- Abstract summary: TextOp is a real-time text-driven humanoid motion generation and control framework.<n>It supports streaming language commands and on-the-fly instruction modification during execution.<n>By bridging interactive motion generation with robust whole-body control, TextOp unlocks free-form intent expression.
- Score: 62.93681680333618
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advances in humanoid whole-body motion tracking have enabled the execution of diverse and highly coordinated motions on real hardware. However, existing controllers are commonly driven either by predefined motion trajectories, which offer limited flexibility when user intent changes, or by continuous human teleoperation, which requires constant human involvement and limits autonomy. This work addresses the problem of how to drive a universal humanoid controller in a real-time and interactive manner. We present TextOp, a real-time text-driven humanoid motion generation and control framework that supports streaming language commands and on-the-fly instruction modification during execution. TextOp adopts a two-level architecture in which a high-level autoregressive motion diffusion model continuously generates short-horizon kinematic trajectories conditioned on the current text input, while a low-level motion tracking policy executes these trajectories on a physical humanoid robot. By bridging interactive motion generation with robust whole-body control, TextOp unlocks free-form intent expression and enables smooth transitions across multiple challenging behaviors such as dancing and jumping, within a single continuous motion execution. Extensive real-robot experiments and offline evaluations demonstrate instant responsiveness, smooth whole-body motion, and precise control. The project page and the open-source code are available at https://text-op.github.io/
Related papers
- From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance [55.31807046722006]
Existing language-guided humanoid pipelines are cumbersome and untrustworthy.<n>We present RoboGhost, a language-free framework that conditions humanoid policies on language-grounded motion latents.<n>We show that RoboGhost substantially reduces deployment latency, improves success rates and tracking precision, and produces smooth, semantically aligned humanoids.
arXiv Detail & Related papers (2025-10-16T17:57:47Z) - Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories.<n>We translate high-level user requests into detailed, semi-dense motion prompts.<n>We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z) - Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes [90.39860012099393]
Sitcom-Crafter is a system for human motion generation in 3D space.<n>Central to the function generation modules is our novel 3D scene-aware human-human interaction module.<n> Augmentation modules encompass plot comprehension for command generation, motion synchronization for seamless integration of different motion types.
arXiv Detail & Related papers (2024-10-14T17:56:19Z) - DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control [12.465927271402442]
Text-conditioned human motion generation allows for user interaction through natural language.<n>DartControl is a Diffusion-based Autoregressive motion primitive model for Real-time Text-driven motion control.<n>Our model effectively learns a compact motion primitive space jointly conditioned on motion history and text inputs.
arXiv Detail & Related papers (2024-10-07T17:58:22Z) - TLControl: Trajectory and Language Control for Human Motion Synthesis [68.09806223962323]
We present TLControl, a novel method for realistic human motion synthesis.
It incorporates both low-level Trajectory and high-level Language semantics controls.
It is practical for interactive and high-quality animation generation.
arXiv Detail & Related papers (2023-11-28T18:54:16Z) - InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs.
We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z) - Story-to-Motion: Synthesizing Infinite and Controllable Character
Animation from Long Text [14.473103773197838]
A new task, Story-to-Motion, arises when characters are required to perform specific motions based on a long text description.
Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive.
We propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text.
arXiv Detail & Related papers (2023-11-13T16:22:38Z) - MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model [35.32967411186489]
MotionDiffuse is a diffusion model-based text-driven motion generation framework.
It excels at modeling complicated data distribution and generating vivid motion sequences.
It responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts.
arXiv Detail & Related papers (2022-08-31T17:58:54Z) - Agile Maneuvers in Legged Robots: a Predictive Control Approach [20.55884151818753]
We present a contact-phase predictive and state-feedback controllers that enables legged robots to plan and perform agile locomotion skills.
Our work is the first to show that predictive control can handle actuation limits, generate agile locomotion maneuvers and execute locally optimal feedback policies on hardware without the use of a separate whole-body controller.
arXiv Detail & Related papers (2022-03-14T23:32:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.