TPDiff: Temporal Pyramid Video Diffusion Model
- URL: http://arxiv.org/abs/2503.09566v1
- Date: Wed, 12 Mar 2025 17:33:22 GMT
- Title: TPDiff: Temporal Pyramid Video Diffusion Model
- Authors: Lingmin Ran, Mike Zheng Shou,
- Abstract summary: We propose TPDiff, a unified framework to enhance training and inference efficiency.<n>By dividing diffusion into several stages, our framework progressively increases frame rate along the diffusion process.<n>By solving the partitioned probability flow ordinary differential equations (ODE) of diffusion under aligned data and noise, our training strategy is applicable to various diffusion forms.
- Score: 16.48006100084994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The development of video diffusion models unveils a significant challenge: the substantial computational demands. To mitigate this challenge, we note that the reverse process of diffusion exhibits an inherent entropy-reducing nature. Given the inter-frame redundancy in video modality, maintaining full frame rates in high-entropy stages is unnecessary. Based on this insight, we propose TPDiff, a unified framework to enhance training and inference efficiency. By dividing diffusion into several stages, our framework progressively increases frame rate along the diffusion process with only the last stage operating on full frame rate, thereby optimizing computational efficiency. To train the multi-stage diffusion model, we introduce a dedicated training framework: stage-wise diffusion. By solving the partitioned probability flow ordinary differential equations (ODE) of diffusion under aligned data and noise, our training strategy is applicable to various diffusion forms and further enhances training efficiency. Comprehensive experimental evaluations validate the generality of our method, demonstrating 50% reduction in training cost and 1.5x improvement in inference efficiency.
Related papers
- One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Accelerating Video Diffusion Models via Distribution Matching [26.475459912686986]
This work introduces a novel framework for diffusion distillation and distribution matching.<n>Our approach focuses on distilling pre-trained diffusion models into a more efficient few-step generator.<n>By leveraging a combination of video GAN loss and a novel 2D score distribution matching loss, we demonstrate the potential to generate high-quality video frames.
arXiv Detail & Related papers (2024-12-08T11:36:32Z) - Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs [36.65594293655289]
DoSSR is a Domain Shift diffusion-based SR model that capitalizes on the generative powers of pretrained diffusion models.<n>At the core of our approach is a domain shift equation that integrates seamlessly with existing diffusion models.<n>Our proposed method achieves state-of-the-art performance on synthetic and real-world datasets, while notably requiring only 5 sampling steps.
arXiv Detail & Related papers (2024-09-26T12:16:11Z) - A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training [53.93563224892207]
We introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps.
As a plug-and-play and architecture-agnostic approach, SpeeD consistently achieves 3-times acceleration across various diffusion architectures, datasets, and tasks.
arXiv Detail & Related papers (2024-05-27T17:51:36Z) - Training-Free Semantic Video Composition via Pre-trained Diffusion Model [96.0168609879295]
Current approaches, predominantly trained on videos with adjusted foreground color and lighting, struggle to address deep semantic disparities beyond superficial adjustments.
We propose a training-free pipeline employing a pre-trained diffusion model imbued with semantic prior knowledge.
Experimental results reveal that our pipeline successfully ensures the visual harmony and inter-frame coherence of the outputs.
arXiv Detail & Related papers (2024-01-17T13:07:22Z) - Non-Denoising Forward-Time Diffusions [4.831663144935879]
We show that the time-reversal argument, common to all denoising diffusion probabilistic modeling proposals, is not necessary.
We obtain diffusion processes targeting the desired data distribution by taking appropriate mixtures of diffusion bridges.
We develop a unifying view of the drift adjustments corresponding to our and to time-reversal approaches.
arXiv Detail & Related papers (2023-12-22T10:26:31Z) - Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models [52.1809084559048]
We propose a novel two-stage divide-and-conquer training strategy termed TDC Training.<n>It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models.<n>While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model.
arXiv Detail & Related papers (2023-12-20T03:32:58Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Simultaneous Image-to-Zero and Zero-to-Noise: Diffusion Models with Analytical Image Attenuation [53.04220377034574]
We propose incorporating an analytical image attenuation process into the forward diffusion process for high-quality (un)conditioned image generation.<n>Our method represents the forward image-to-noise mapping as simultaneous textitimage-to-zero mapping and textitzero-to-noise mapping.<n>We have conducted experiments on unconditioned image generation, textite.g., CIFAR-10 and CelebA-HQ-256, and image-conditioned downstream tasks such as super-resolution, saliency detection, edge detection, and image inpainting.
arXiv Detail & Related papers (2023-06-23T18:08:00Z) - Lipschitz Singularities in Diffusion Models [64.28196620345808]
Diffusion models often display the infinite Lipschitz property of the network with respect to time variable near the zero point.<n>We propose a novel approach, dubbed E-TSDM, which alleviates the Lipschitz singularities of the diffusion model near the zero point.<n>Our work may advance the understanding of the general diffusion process, and also provide insights for the design of diffusion models.
arXiv Detail & Related papers (2023-06-20T03:05:28Z) - Fast Diffusion Model [122.36693015093041]
Diffusion models (DMs) have been adopted across diverse fields with their abilities in capturing intricate data distributions.
In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a DM optimization perspective.
arXiv Detail & Related papers (2023-06-12T09:38:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.