Related papers: Diffusion Probabilistic Modeling for Video Generation

Diffusion Probabilistic Modeling for Video Generation

URL: http://arxiv.org/abs/2203.09481v1
Date: Wed, 16 Mar 2022 03:52:45 GMT
Title: Diffusion Probabilistic Modeling for Video Generation
Authors: Ruihan Yang, Prakhar Srivastava, Stephan Mandt
Abstract summary: Denoising diffusion probabilistic models are a promising new class of generative models that are competitive with GANs on perceptual metrics. Inspired by recent advances in neural video compression, we use denoising diffusion models to generate a residual baseline to a deterministic next-frame prediction. We find significant improvements in terms of perceptual quality on all data and improvements in terms of frame forecasting for complex high-resolution videos.
Score: 17.48026395867434
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Denoising diffusion probabilistic models are a promising new class of generative models that are competitive with GANs on perceptual metrics. In this paper, we explore their potential for sequentially generating video. Inspired by recent advances in neural video compression, we use denoising diffusion models to stochastically generate a residual to a deterministic next-frame prediction. We compare this approach to two sequential VAE and two GAN baselines on four datasets, where we test the generated frames for perceptual quality and forecasting accuracy against ground truth frames. We find significant improvements in terms of perceptual quality on all data and improvements in terms of frame forecasting for complex high-resolution videos.

Related papers

Autoregressive Video Generation without Vector Quantization [90.87907377618747]
We reformulate the video generation problem as a non-quantized autoregressive modeling of temporal frame-by-frame prediction. With the proposed approach, we train a novel video autoregressive model without vector quantization, termed NOVA. Our results demonstrate that NOVA surpasses prior autoregressive video models in data efficiency, inference speed, visual fidelity, and video fluency, even with a much smaller model capacity.
arXiv Detail & Related papers (2024-12-18T18:59:53Z)
Progressive Compression with Universally Quantized Diffusion Models [35.199627388957566]
We explore the potential of diffusion models for progressive coding, resulting in a sequence of bits that can be incrementally transmitted and decoded. Unlike prior work based on Gaussian diffusion or conditional diffusion models, we propose a new form of diffusion model with uniform noise in the forward process. We obtain promising first results on image compression, achieving competitive rate-distortion and rate-realism results on a wide range of bit-rates with a single model.
arXiv Detail & Related papers (2024-12-14T19:06:01Z)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI) In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion) Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z)
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS [0.0]
The diffusion model is capable of generating high-quality data through a probabilistic approach. It suffers from the drawback of slow generation speed due to the requirement of a large number of time steps. We propose a speech synthesis model with two discriminators: a diffusion discriminator for learning the distribution of the reverse process and a spectrogram discriminator for learning the distribution of the generated data.
arXiv Detail & Related papers (2023-08-03T07:22:04Z)
Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations. We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation [88.49030739715701]
This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis. Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation.
arXiv Detail & Related papers (2023-03-15T02:16:39Z)
Insights from Generative Modeling for Neural Video Compression [31.59496634465347]
We present newly proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. We propose several architectures that yield state-of-the-art video compression performance on high-resolution video. We provide further evidence that the generative modeling viewpoint can advance the neural video coding field.
arXiv Detail & Related papers (2021-07-28T02:19:39Z)
Variational Diffusion Models [33.0719137062396]
We introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on image density estimation benchmarks. We show that the variational lower bound (VLB) simplifies to a remarkably short expression in terms of the signal-to-noise ratio of the diffused data.
arXiv Detail & Related papers (2021-07-01T17:43:20Z)
Anomaly Detection of Time Series with Smoothness-Inducing Sequential Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series. Our model parameterizes mean and variance for each time-stamp with flexible neural networks. We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z)
Score-Based Generative Modeling through Stochastic Differential Equations [114.39209003111723]
We present a differential equation that transforms a complex data distribution to a known prior distribution by injecting noise. A corresponding reverse-time SDE transforms the prior distribution back into the data distribution by slowly removing the noise. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks. We demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.
arXiv Detail & Related papers (2020-11-26T19:39:10Z)
Denoising Diffusion Probabilistic Models [91.94962645056896]
We present high quality image synthesis results using diffusion probabilistic models. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics.
arXiv Detail & Related papers (2020-06-19T17:24:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.