TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis
- URL: http://arxiv.org/abs/2307.15042v2
- Date: Sat, 29 Jul 2023 05:26:37 GMT
- Title: TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis
- Authors: Zihan Zhang, Richard Liu, Kfir Aberman, Rana Hanocka
- Abstract summary: We propose to adapt the gradual diffusion concept into the temporal-axis of the motion sequence.
Our key idea is to extend the DDPM framework to support temporally varying denoising, thereby entangling the two axes.
This new mechanism paves the way towards a new framework for long-term motion synthesis with applications to character animation and other domains.
- Score: 27.23431793291876
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The gradual nature of a diffusion process that synthesizes samples in small
increments constitutes a key ingredient of Denoising Diffusion Probabilistic
Models (DDPM), which have presented unprecedented quality in image synthesis
and been recently explored in the motion domain. In this work, we propose to
adapt the gradual diffusion concept (operating along a diffusion time-axis)
into the temporal-axis of the motion sequence. Our key idea is to extend the
DDPM framework to support temporally varying denoising, thereby entangling the
two axes. Using our special formulation, we iteratively denoise a motion buffer
that contains a set of increasingly-noised poses, which auto-regressively
produces an arbitrarily long stream of frames. With a stationary diffusion
time-axis, in each diffusion step we increment only the temporal-axis of the
motion such that the framework produces a new, clean frame which is removed
from the beginning of the buffer, followed by a newly drawn noise vector that
is appended to it. This new mechanism paves the way towards a new framework for
long-term motion synthesis with applications to character animation and other
domains.
Related papers
- Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation [36.098738197088124]
This work presents a Diffusion Reuse MOtion network to accelerate latent video generation.
coarse-grained noises in earlier denoising steps have demonstrated high motion consistency across consecutive video frames.
Dr. Mo propagates those coarse-grained noises onto the next frame by incorporating carefully designed, lightweight inter-frame motions.
arXiv Detail & Related papers (2024-09-19T07:50:34Z) - RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation [5.535590461577558]
RecMoDiffuse is a new recurrent diffusion formulation for temporal modelling.
We demonstrate the effectiveness of RecMoDiffuse in the temporal modelling of human motion.
arXiv Detail & Related papers (2024-06-11T11:25:37Z) - Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
Motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity.
We propose a novel diffusion framework, motion-aware latent diffusion models (MADiff)
Our method achieves state-of-the-art performance significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-04-21T05:09:56Z) - DiffusionPhase: Motion Diffusion in Frequency Domain [69.811762407278]
We introduce a learning-based method for generating high-quality human motion sequences from text descriptions.
Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences.
We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space.
arXiv Detail & Related papers (2023-12-07T04:39:22Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep
Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.
We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost.
Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z) - Synthesizing Long-Term Human Motions with Diffusion Models via Coherent
Sampling [74.62570964142063]
Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions.
We propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods.
Our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream.
arXiv Detail & Related papers (2023-08-03T16:18:32Z) - Fast Diffusion Model [122.36693015093041]
Diffusion models (DMs) have been adopted across diverse fields with their abilities in capturing intricate data distributions.
In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a DM optimization perspective.
arXiv Detail & Related papers (2023-06-12T09:38:04Z) - BoDiffusion: Diffusing Sparse Observations for Full-Body Human Motion
Synthesis [14.331548412833513]
Mixed reality applications require tracking the user's full-body motion to enable an immersive experience.
We propose BoDiffusion -- a generative diffusion model for motion synthesis to tackle this under-constrained reconstruction problem.
We present a time and space conditioning scheme that allows BoDiffusion to leverage sparse tracking inputs while generating smooth and realistic full-body motion sequences.
arXiv Detail & Related papers (2023-04-21T16:39:05Z) - Human Motion Diffusion as a Generative Prior [20.004837564647367]
We introduce three forms of composition based on diffusion priors.
We tackle the challenge of long sequence generation.
Using parallel composition, we show promising steps toward two-person generation.
arXiv Detail & Related papers (2023-03-02T17:09:27Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z) - TSI: Temporal Saliency Integration for Video Action Recognition [32.18535820790586]
We propose a Temporal Saliency Integration (TSI) block, which mainly contains a Salient Motion Excitation (SME) module and a Cross-scale Temporal Integration (CTI) module.
SME aims to highlight the motion-sensitive area through local-global motion modeling.
CTI is designed to perform multi-scale temporal modeling through a group of separate 1D convolutions respectively.
arXiv Detail & Related papers (2021-06-02T11:43:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.