Efficient Text-driven Motion Generation via Latent Consistency Training
- URL: http://arxiv.org/abs/2405.02791v3
- Date: Fri, 29 Nov 2024 16:03:59 GMT
- Title: Efficient Text-driven Motion Generation via Latent Consistency Training
- Authors: Mengxian Hu, Minghao Zhu, Xun Zhou, Qingqing Yan, Shu Li, Chengju Liu, Qijun Chen,
- Abstract summary: We propose a motion latent consistency training framework (MLCT) to solve nonlinear reverse diffusion trajectories.<n>By combining these enhancements, we achieve stable and consistency training in non-pixel modality and latent representation spaces.
- Score: 21.348658259929053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-driven human motion generation based on diffusion strategies establishes a reliable foundation for multimodal applications in human-computer interactions. However, existing advances face significant efficiency challenges due to the substantial computational overhead of iteratively solving for nonlinear reverse diffusion trajectories during the inference phase. To this end, we propose the motion latent consistency training framework (MLCT), which precomputes reverse diffusion trajectories from raw data in the training phase and enables few-step or single-step inference via self-consistency constraints in the inference phase. Specifically, a motion autoencoder with quantization constraints is first proposed for constructing concise and bounded solution distributions for motion diffusion processes. Subsequently, a classifier-free guidance format is constructed via an additional unconditional loss function to accomplish the precomputation of conditional diffusion trajectories in the training phase. Finally, a clustering guidance module based on the K-nearest-neighbor algorithm is developed for the chain-conduction optimization mechanism of self-consistency constraints, which provides additional references of solution distributions at a small query cost. By combining these enhancements, we achieve stable and consistency training in non-pixel modality and latent representation spaces. Benchmark experiments demonstrate that our method significantly outperforms traditional consistency distillation methods with reduced training cost and enhances the consistency model to perform comparably to state-of-the-art models with lower inference costs.
Related papers
- Aligning Diffusion Model with Problem Constraints for Trajectory Optimization [0.6629765271909505]
We propose a novel approach that aligns diffusion models explicitly with problem-specific constraints.
Our approach is well-suited for integration into the Dynamic Data-driven Application Systems (DDDAS) framework.
arXiv Detail & Related papers (2025-04-01T01:46:05Z) - A First-order Generative Bilevel Optimization Framework for Diffusion Models [57.40597004445473]
Diffusion models iteratively denoise data samples to synthesize high-quality outputs.
Traditional bilevel methods fail due to infinite-dimensional probability space and prohibitive sampling costs.
We formalize this challenge as a generative bilevel optimization problem.
Our first-order bilevel framework overcomes the incompatibility of conventional bilevel methods with diffusion processes.
arXiv Detail & Related papers (2025-02-12T21:44:06Z) - Decentralized Inference for Spatial Data Using Low-Rank Models [4.168323530566095]
This paper presents a decentralized framework tailored for parameter inference in spatial low-rank models.
A key obstacle arises from the spatial dependence among observations, which prevents the log-likelihood from being expressed as a summation.
Our approach employs a block descent method integrated with multi-consensus and dynamic consensus averaging for effective parameter optimization.
arXiv Detail & Related papers (2025-02-01T04:17:01Z) - FlowDAS: A Flow-Based Framework for Data Assimilation [15.64941169350615]
FlowDAS is a novel generative model-based framework using the interpolants to unify the learning of state transition dynamics and generative priors.
Our experiments demonstrate FlowDAS's superior performance on various benchmarks, from the Lorenz system to high-dimensional fluid superresolution tasks.
arXiv Detail & Related papers (2025-01-13T05:03:41Z) - Diffusion Predictive Control with Constraints [51.91057765703533]
Diffusion predictive control with constraints (DPCC)
An algorithm for diffusion-based control with explicit state and action constraints that can deviate from those in the training data.
We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints while maintaining performance on the learned control task.
arXiv Detail & Related papers (2024-12-12T15:10:22Z) - Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation [34.529280562470746]
We introduce a novel self-supervised loss combining the Contrast Maximization framework with a non-linear motion prior in the form of pixel-level trajectories.
Their effectiveness is demonstrated in two scenarios: In dense continuous-time motion estimation, our method improves the zero-shot performance of a synthetically trained model by 29%.
arXiv Detail & Related papers (2024-07-15T15:18:28Z) - On the Trajectory Regularity of ODE-based Diffusion Sampling [79.17334230868693]
Diffusion-based generative models use differential equations to establish a smooth connection between a complex data distribution and a tractable prior distribution.
In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models.
arXiv Detail & Related papers (2024-05-18T15:59:41Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z) - EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation [57.539634387672656]
Current state-of-the-art generative diffusion models have produced impressive results but struggle to achieve fast generation without sacrificing quality.
We introduce Efficient Motion Diffusion Model (EMDM) for fast and high-quality human motion generation.
arXiv Detail & Related papers (2023-12-04T18:58:38Z) - Non-Cross Diffusion for Semantic Consistency [12.645444338043934]
We introduce Non-Cross Diffusion', an innovative approach in generative modeling for learning ordinary differential equation (ODE) models.
Our methodology strategically incorporates an ascending dimension of input to effectively connect points sampled from two distributions with uncrossed paths.
arXiv Detail & Related papers (2023-11-30T05:53:39Z) - Generative Modeling with Phase Stochastic Bridges [49.4474628881673]
Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs.
We introduce a novel generative modeling framework grounded in textbfphase space dynamics
Our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.
arXiv Detail & Related papers (2023-10-11T18:38:28Z) - DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for
Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space.
We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process.
Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z) - Observation-Guided Diffusion Probabilistic Models [41.749374023639156]
We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM)
Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain.
We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines.
arXiv Detail & Related papers (2023-10-06T06:29:06Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - Evolve Smoothly, Fit Consistently: Learning Smooth Latent Dynamics For
Advection-Dominated Systems [14.553972457854517]
We present a data-driven, space-time continuous framework to learn surrogatemodels for complex physical systems.
We leverage the expressive power of the network and aspecially designed consistency-inducing regularization to obtain latent trajectories that are both low-dimensional and smooth.
arXiv Detail & Related papers (2023-01-25T03:06:03Z) - ProgressiveMotionSeg: Mutually Reinforced Framework for Event-Based
Motion Segmentation [101.19290845597918]
This paper presents a Motion Estimation (ME) module and an Event Denoising (ED) module jointly optimized in a mutually reinforced manner.
Taking temporal correlation as guidance, ED module calculates the confidence that each event belongs to real activity events, and transmits it to ME module to update energy function of motion segmentation for noise suppression.
arXiv Detail & Related papers (2022-03-22T13:40:26Z) - Motion Deblurring with Real Events [50.441934496692376]
We propose an end-to-end learning framework for event-based motion deblurring in a self-supervised manner.
Real-world events are exploited to alleviate the performance degradation caused by data inconsistency.
arXiv Detail & Related papers (2021-09-28T13:11:44Z) - Learning a Generative Motion Model from Image Sequences based on a
Latent Motion Matrix [8.774604259603302]
We learn a probabilistic motion model from simulating temporal-temporal registration in a sequence of images.
We show improved registration accuracy-temporally smoother consistencys compared to three state-of-the-art registration algorithms.
We also demonstrate the model's applicability for motion analysis, simulation and super-resolution by an improved motion reconstruction from sequences with missing frames.
arXiv Detail & Related papers (2020-11-03T14:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.