Training Consistency Models with Variational Noise Coupling
- URL: http://arxiv.org/abs/2502.18197v1
- Date: Tue, 25 Feb 2025 13:38:04 GMT
- Title: Training Consistency Models with Variational Noise Coupling
- Authors: Gianluigi Silvestri, Luca Ambrogioni, Chieh-Hsin Lai, Yuhta Takida, Yuki Mitsufuji,
- Abstract summary: We propose a novel CT training approach based on the Flow Matching framework.<n>Our main contribution is a trained noise-coupling scheme inspired by the architecture of Variational Autoencoders (VAE)<n> Empirical results across diverse image datasets show significant generative improvements.
- Score: 21.978942601947026
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Consistency Training (CT) has recently emerged as a promising alternative to diffusion models, achieving competitive performance in image generation tasks. However, non-distillation consistency training often suffers from high variance and instability, and analyzing and improving its training dynamics is an active area of research. In this work, we propose a novel CT training approach based on the Flow Matching framework. Our main contribution is a trained noise-coupling scheme inspired by the architecture of Variational Autoencoders (VAE). By training a data-dependent noise emission model implemented as an encoder architecture, our method can indirectly learn the geometry of the noise-to-data mapping, which is instead fixed by the choice of the forward process in classical CT. Empirical results across diverse image datasets show significant generative improvements, with our model outperforming baselines and achieving the state-of-the-art (SoTA) non-distillation CT FID on CIFAR-10, and attaining FID on par with SoTA on ImageNet at $64 \times 64$ resolution in 2-step generation. Our code is available at https://github.com/sony/vct .
Related papers
- One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.
To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.
Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Enhancing Low Dose Computed Tomography Images Using Consistency Training Techniques [7.694256285730863]
In this paper, we introduce the beta noise distribution, which provides flexibility in adjusting noise levels.
High Noise Improved Consistency Training (HN-iCT) is trained in a supervised fashion.
Our results indicate that unconditional image generation using HN-iCT significantly outperforms basic CT and iCT training techniques with NFE=1.
arXiv Detail & Related papers (2024-11-19T02:48:36Z) - Stable Consistency Tuning: Understanding and Improving Consistency Models [40.2712218203989]
Diffusion models achieve superior generation quality but suffer from slow generation speed due to iterative nature of denoising.<n> consistency models, a new generative family, achieve competitive performance with significantly faster sampling.<n>We propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference(TD) Learning.
arXiv Detail & Related papers (2024-10-24T17:55:52Z) - SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder [13.453138169497903]
SeNM-VAE is a semi-supervised noise modeling method that leverages both paired and unpaired datasets to generate realistic degraded data.
We employ our method to generate paired training samples for real-world image denoising and super-resolution tasks.
Our approach excels in the quality of synthetic degraded images compared to other unpaired and paired noise modeling methods.
arXiv Detail & Related papers (2024-03-26T09:03:40Z) - Blue noise for diffusion models [50.99852321110366]
We introduce a novel and general class of diffusion models taking correlated noise within and across images into account.
Our framework allows introducing correlation across images within a single mini-batch to improve gradient flow.
We perform both qualitative and quantitative evaluations on a variety of datasets using our method.
arXiv Detail & Related papers (2024-02-07T14:59:25Z) - DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image
Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments.
Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features.
Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z) - DiffiT: Diffusion Vision Transformers for Image Generation [88.08529836125399]
Vision Transformer (ViT) has demonstrated strong modeling capabilities and scalability, especially for recognition tasks.
We study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT)
DiffiT is surprisingly effective in generating high-fidelity images with significantly better parameter efficiency.
arXiv Detail & Related papers (2023-12-04T18:57:01Z) - Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes.
For many applications such as image editing, the model input comes from a distribution that is not random noise.
In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z) - DOLCE: A Model-Based Probabilistic Diffusion Framework for Limited-Angle
CT Reconstruction [42.028139152832466]
Limited-Angle Computed Tomography (LACT) is a non-destructive evaluation technique used in a variety of applications ranging from security to medicine.
We present DOLCE, a new deep model-based framework for LACT that uses a conditional diffusion model as an image prior.
arXiv Detail & Related papers (2022-11-22T15:30:38Z) - Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches.
We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis [148.16279746287452]
We propose a swin-conv block to incorporate the local modeling ability of residual convolutional layer and non-local modeling ability of swin transformer block.
For the training data synthesis, we design a practical noise degradation model which takes into consideration different kinds of noise.
Experiments on AGWN removal and real image denoising demonstrate that the new network architecture design achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-03-24T18:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.