Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint
Method
- URL: http://arxiv.org/abs/2312.12030v1
- Date: Tue, 19 Dec 2023 10:30:31 GMT
- Title: Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint
Method
- Authors: Jiachun Pan, Hanshu Yan, Jun Hao Liew, Jiashi Feng, Vincent Y. F. Tan
- Abstract summary: We propose Symplectic Adjoint Guidance (SAG), which calculates the gradient guidance in two inner stages.
SAG generates images with higher qualities compared to the baselines in both guided image and video generation tasks.
- Score: 110.9458914721516
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training-free guided sampling in diffusion models leverages off-the-shelf
pre-trained networks, such as an aesthetic evaluation model, to guide the
generation process. Current training-free guided sampling algorithms obtain the
guidance energy function based on a one-step estimate of the clean image.
However, since the off-the-shelf pre-trained networks are trained on clean
images, the one-step estimation procedure of the clean image may be inaccurate,
especially in the early stages of the generation process in diffusion models.
This causes the guidance in the early time steps to be inaccurate. To overcome
this problem, we propose Symplectic Adjoint Guidance (SAG), which calculates
the gradient guidance in two inner stages. Firstly, SAG estimates the clean
image via $n$ function calls, where $n$ serves as a flexible hyperparameter
that can be tailored to meet specific image quality requirements. Secondly, SAG
uses the symplectic adjoint method to obtain the gradients accurately and
efficiently in terms of the memory requirements. Extensive experiments
demonstrate that SAG generates images with higher qualities compared to the
baselines in both guided image and video generation tasks.
Related papers
- Gradient-Free Classifier Guidance for Diffusion Model Sampling [4.450496470631169]
Gradient-free Guidance (GFCG) method consistently improves class prediction accuracy.
For ImageNet 512$times$512, we achieve a record $FD_textDINOv2$ 23.09, while simultaneously attaining a higher classification Precision (94.3%) compared to ATG (90.2%)
arXiv Detail & Related papers (2024-11-23T00:22:21Z) - Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
Diffusion models have dominated the field of large, generative image models.
We propose an algorithm for fast-constrained sampling in large pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations.
We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders.
The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - Iterative Token Evaluation and Refinement for Real-World
Super-Resolution [77.74289677520508]
Real-world image super-resolution (RWSR) is a long-standing problem as low-quality (LQ) images often have complex and unidentified degradations.
We propose an Iterative Token Evaluation and Refinement framework for RWSR.
We show that ITER is easier to train than Generative Adversarial Networks (GANs) and more efficient than continuous diffusion models.
arXiv Detail & Related papers (2023-12-09T17:07:32Z) - Manifold Preserving Guided Diffusion [121.97907811212123]
Conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training.
We propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework.
arXiv Detail & Related papers (2023-11-28T02:08:06Z) - Deep Learning Adapted Acceleration for Limited-view Photoacoustic
Computed Tomography [1.8830359888767887]
Photoacoustic computed tomography (PACT) uses unfocused large-area light to illuminate the target with ultrasound transducer array for PA signal detection.
Limited-view issue could cause a low-quality image in PACT due to the limitation of geometric condition.
A model-based method that combines the mathematical variational model with deep learning is proposed to speed up and regularize the unrolled procedure of reconstruction.
arXiv Detail & Related papers (2021-11-08T02:05:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.