Controllable Motion Generation via Diffusion Modal Coupling
- URL: http://arxiv.org/abs/2503.02353v1
- Date: Tue, 04 Mar 2025 07:22:34 GMT
- Title: Controllable Motion Generation via Diffusion Modal Coupling
- Authors: Luobin Wang, Hongzhan Yu, Chenning Yu, Sicun Gao, Henrik Christensen,
- Abstract summary: We propose a novel framework that enhances controllability in diffusion models by leveraging multi-modal prior distributions.<n>We evaluate our approach on motion prediction using a dataset and multi-task control in Maze2D environments.
- Score: 14.004287903552534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have recently gained significant attention in robotics due to their ability to generate multi-modal distributions of system states and behaviors. However, a key challenge remains: ensuring precise control over the generated outcomes without compromising realism. This is crucial for applications such as motion planning or trajectory forecasting, where adherence to physical constraints and task-specific objectives is essential. We propose a novel framework that enhances controllability in diffusion models by leveraging multi-modal prior distributions and enforcing strong modal coupling. This allows us to initiate the denoising process directly from distinct prior modes that correspond to different possible system behaviors, ensuring sampling to align with the training distribution. We evaluate our approach on motion prediction using the Waymo dataset and multi-task control in Maze2D environments. Experimental results show that our framework outperforms both guidance-based techniques and conditioned models with unimodal priors, achieving superior fidelity, diversity, and controllability, even in the absence of explicit conditioning. Overall, our approach provides a more reliable and scalable solution for controllable motion generation in robotics.
Related papers
- Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy [56.424032454461695]
We present Dita, a scalable framework that leverages Transformer architectures to directly denoise continuous action sequences.
Dita employs in-context conditioning -- enabling fine-grained alignment between denoised actions and raw visual tokens from historical observations.
Dita effectively integrates cross-embodiment datasets across diverse camera perspectives, observation scenes, tasks, and action spaces.
arXiv Detail & Related papers (2025-03-25T15:19:56Z) - Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control [16.319698848279966]
We present Diffuse-CLoC, a guided diffusion framework for physics-based look-ahead control.
It enables intuitive, steerable, and physically realistic motion generation.
arXiv Detail & Related papers (2025-03-14T18:42:29Z) - End-to-End Predictive Planner for Autonomous Driving with Consistency Models [5.966385886363771]
Trajectory prediction and planning are fundamental components for autonomous vehicles to navigate safely and efficiently in dynamic environments.<n>Traditionally, these components have often been treated as separate modules, limiting the ability to perform interactive planning.<n>We present a novel unified and data-driven framework that integrates prediction and planning with a single consistency model.
arXiv Detail & Related papers (2025-02-12T00:26:01Z) - Diffusion Predictive Control with Constraints [51.91057765703533]
Diffusion predictive control with constraints (DPCC)<n>An algorithm for diffusion-based control with explicit state and action constraints that can deviate from those in the training data.<n>We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints while maintaining performance on the learned control task.
arXiv Detail & Related papers (2024-12-12T15:10:22Z) - Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation [5.1245307851495]
This paper introduces Diffusion Policies For Compliant Manipulation (DIPCOM), a novel diffusion-based framework for compliant control tasks.
By leveraging generative diffusion models, we develop a policy that predicts Cartesian end-effector poses and adjusts arm stiffness to maintain the necessary force.
Our approach enhances force control through multimodal distribution modeling, improves the integration of diffusion policies in compliance control, and extends our previous work by demonstrating its effectiveness in real-world tasks.
arXiv Detail & Related papers (2024-10-25T00:56:15Z) - Diffusion-Based Environment-Aware Trajectory Prediction [3.1406146587437904]
The ability to predict the future trajectories of traffic participants is crucial for the safe and efficient operation of autonomous vehicles.
In this paper, a diffusion-based generative model for multi-agent trajectory prediction is proposed.
The model is capable of capturing the complex interactions between traffic participants and the environment, accurately learning the multimodal nature of the data.
arXiv Detail & Related papers (2024-03-18T10:35:15Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Interactive Character Control with Auto-Regressive Motion Diffusion Models [18.727066177880708]
We propose A-MDM (Auto-regressive Motion Diffusion Model) for real-time motion synthesis.
Our conditional diffusion model takes an initial pose as input, and auto-regressively generates successive motion frames conditioned on previous frame.
We introduce a suite of techniques for incorporating interactive controls into A-MDM, such as task-oriented sampling, in-painting, and hierarchical reinforcement learning.
arXiv Detail & Related papers (2023-06-01T07:48:34Z) - Collaborative Diffusion for Multi-Modal Face Generation and Editing [34.16906110777047]
We present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training.
Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multi-modal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model.
arXiv Detail & Related papers (2023-04-20T17:59:02Z) - Model-Based Reinforcement Learning with Isolated Imaginations [61.67183143982074]
We propose Iso-Dream++, a model-based reinforcement learning approach.
We perform policy optimization based on the decoupled latent imaginations.
This enables long-horizon visuomotor control tasks to benefit from isolating mixed dynamics sources in the wild.
arXiv Detail & Related papers (2023-03-27T02:55:56Z) - Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions.
We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel.
We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z) - Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill
Primitives [89.34229413345541]
We propose a conditioning scheme which avoids pitfalls by learning the controller and its conditioning in an end-to-end manner.
Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion.
We report significant improvements in task success over representative MPC and IL baselines.
arXiv Detail & Related papers (2020-03-19T15:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.