MaskedManipulator: Versatile Whole-Body Manipulation
- URL: http://arxiv.org/abs/2505.19086v2
- Date: Sat, 20 Sep 2025 17:43:37 GMT
- Title: MaskedManipulator: Versatile Whole-Body Manipulation
- Authors: Chen Tessler, Yifeng Jiang, Erwin Coumans, Zhengyi Luo, Gal Chechik, Xue Bin Peng,
- Abstract summary: We introduce MaskedManipulator, a generative control policy distilled from a tracking controller trained on large-scale human motion capture data.<n>This two-stage learning process allows the system to perform complex interaction behaviors, while providing intuitive user control over both character and object motions.
- Score: 38.02818493367002
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We tackle the challenges of synthesizing versatile, physically simulated human motions for full-body object manipulation. Unlike prior methods that are focused on detailed motion tracking, trajectory following, or teleoperation, our framework enables users to specify versatile high-level objectives such as target object poses or body poses. To achieve this, we introduce MaskedManipulator, a generative control policy distilled from a tracking controller trained on large-scale human motion capture data. This two-stage learning process allows the system to perform complex interaction behaviors, while providing intuitive user control over both character and object motions. MaskedManipulator produces goal-directed manipulation behaviors that expand the scope of interactive animation systems beyond task-specific solutions.
Related papers
- Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos [56.510263910611684]
We tackle prehensile manipulation, in which tasks involve grasping an object before performing various post-grasp motions.<n>Human videos offer strong signals for learning the post-grasp motions, but they are less useful for learning the prerequisite grasping behaviors.<n>We present Perceive-Simulate-Imitate (PSI), a framework for training a modular manipulation policy using human video motion data.
arXiv Detail & Related papers (2026-02-13T18:59:10Z) - Mask2IV: Interaction-Centric Video Generation via Mask Trajectories [32.04930240447431]
Mask2IV is a novel framework specifically designed for interaction-centric video generation.<n>It adopts a decoupled two-stage pipeline that first predicts plausible motion trajectories for both actor and object, then generates a video conditioned on these trajectories.<n>It supports versatile and intuitive control, allowing users to specify the target object of interaction and guide the motion trajectory through action descriptions or spatial position cues.
arXiv Detail & Related papers (2025-10-03T16:04:33Z) - Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control [72.00655365269]
We present RoboMaster, a novel framework that models inter-object dynamics through a collaborative trajectory formulation.<n>Unlike prior methods that decompose objects, our core is to decompose the interaction process into three sub-stages: pre-interaction, interaction, and post-interaction.<n>Our method outperforms existing approaches, establishing new state-of-the-art performance in trajectory-controlled video generation for robotic manipulation.
arXiv Detail & Related papers (2025-06-02T17:57:06Z) - HOSIG: Full-Body Human-Object-Scene Interaction Generation with Hierarchical Scene Perception [57.37135310143126]
HO SIG is a novel framework for synthesizing full-body interactions through hierarchical scene perception.<n>Our framework supports unlimited motion length through autoregressive generation and requires minimal manual intervention.<n>This work bridges the critical gap between scene-aware navigation and dexterous object manipulation.
arXiv Detail & Related papers (2025-06-02T12:08:08Z) - Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [57.942404069484134]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.<n>Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.<n>We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - Whole-Body Teleoperation for Mobile Manipulation at Zero Added Cost [8.71539730969424]
MoMa-Teleop is a novel teleoperation method that infers end-effector motions from existing interfaces.<n>We demonstrate that our approach results in a significant reduction in task completion time across a variety of robots and tasks.
arXiv Detail & Related papers (2024-09-23T15:09:45Z) - MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting [38.15158715821526]
MaskedMimic is a novel approach that formulates physics-based character control as a general motion inpainting problem.
By unifying character control through motion inpainting, MaskedMimic creates versatile virtual characters.
These characters can dynamically adapt to complex scenes and compose diverse motions on demand, enabling more interactive and immersive experiences.
arXiv Detail & Related papers (2024-09-22T11:10:59Z) - Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots [13.229028132036321]
Masked Humanoid Controller (MHC) supports standing, walking, and mimicry of whole and partial-body motions.<n>MHC imitates partially masked motions from a library of behaviors spanning standing, walking, optimized reference trajectories, re-targeted video clips, and human motion capture data.<n>We demonstrate sim-to-real transfer on the real-world Digit V3 humanoid robot.
arXiv Detail & Related papers (2024-07-30T09:10:24Z) - Controllable Human-Object Interaction Synthesis [77.56877961681462]
We propose Controllable Human-Object Interaction Synthesis (CHOIS) to generate synchronized object motion and human motion in 3D scenes.
Here, language descriptions inform style and intent, and waypoints, which can be effectively extracted from high-level planning, ground the motion in the scene.
Our module seamlessly integrates with a path planning module, enabling the generation of long-term interactions in 3D environments.
arXiv Detail & Related papers (2023-12-06T21:14:20Z) - Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control.
We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset.
We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z) - Object Motion Guided Human Motion Synthesis [22.08240141115053]
We study the problem of full-body human motion synthesis for the manipulation of large-sized objects.
We propose Object MOtion guided human MOtion synthesis (OMOMO), a conditional diffusion framework.
We develop a novel system that captures full-body human manipulation motions by simply attaching a smartphone to the object being manipulated.
arXiv Detail & Related papers (2023-09-28T08:22:00Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - AMP: Adversarial Motion Priors for Stylized Physics-Based Character
Control [145.61135774698002]
We propose a fully automated approach to selecting motion for a character to track in a given scenario.
High-level task objectives that the character should perform can be specified by relatively simple reward functions.
Low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips.
Our system produces high-quality motions comparable to those achieved by state-of-the-art tracking-based techniques.
arXiv Detail & Related papers (2021-04-05T22:43:14Z) - Articulated Object Interaction in Unknown Scenes with Whole-Body Mobile
Manipulation [16.79185733369416]
We propose a two-stage architecture for autonomous interaction with large articulated objects in unknown environments.
The first stage uses a learned model to estimate the articulated model of a target object from an RGB-D input and predicts an action-conditional sequence of states for interaction.
The second stage comprises of a whole-body motion controller to manipulate the object along the generated kinematic plan.
arXiv Detail & Related papers (2021-03-18T21:32:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.