Bring Metric Functions into Diffusion Models
- URL: http://arxiv.org/abs/2401.02414v1
- Date: Thu, 4 Jan 2024 18:55:01 GMT
- Title: Bring Metric Functions into Diffusion Models
- Authors: Jie An, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Zicheng Liu, Lijuan
Wang, Jiebo Luo
- Abstract summary: We introduce a Cascaded Diffusion Model (Cas-DM) that improves a Denoising Diffusion Probabilistic Model (DDPM)
The proposed diffusion model backbone enables the effective use of the LPIPS loss, leading to state-of-the-art image quality (FID, sFID, IS)
Experiment results show that the proposed diffusion model backbone enables the effective use of the LPIPS loss, leading to state-of-the-art image quality (FID, sFID, IS)
- Score: 145.71911023514252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a Cascaded Diffusion Model (Cas-DM) that improves a Denoising
Diffusion Probabilistic Model (DDPM) by effectively incorporating additional
metric functions in training. Metric functions such as the LPIPS loss have been
proven highly effective in consistency models derived from the score matching.
However, for the diffusion counterparts, the methodology and efficacy of adding
extra metric functions remain unclear. One major challenge is the mismatch
between the noise predicted by a DDPM at each step and the desired clean image
that the metric function works well on. To address this problem, we propose
Cas-DM, a network architecture that cascades two network modules to effectively
apply metric functions to the diffusion model training. The first module,
similar to a standard DDPM, learns to predict the added noise and is unaffected
by the metric function. The second cascaded module learns to predict the clean
image, thereby facilitating the metric function computation. Experiment results
show that the proposed diffusion model backbone enables the effective use of
the LPIPS loss, leading to state-of-the-art image quality (FID, sFID, IS) on
various established benchmarks.
Related papers
- Bring the Power of Diffusion Model to Defect Detection [0.0]
diffusion probabilistic model (DDPM) is pre-trained to extract the features of denoising process to construct as a feature repository.
The queried latent features are reconstructed and filtered to obtain high-dimensional DDPM features.
Experiment results demonstrate that our method achieves competitive results on several industrial datasets.
arXiv Detail & Related papers (2024-08-25T14:28:49Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - Diffusion Model Patching via Mixture-of-Prompts [17.04227271007777]
Diffusion Model Patching (DMP) is a simple method to boost the performance of pre-trained diffusion models.
DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen.
arXiv Detail & Related papers (2024-05-28T04:47:54Z) - SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired
Image-to-Image Translation [96.11061713135385]
This work presents a new score-decomposed diffusion model to explicitly optimize the tangled distributions during image generation.
We equalize the refinement parts of the score function and energy guidance, which permits multi-objective optimization on the manifold.
SDDM outperforms existing SBDM-based methods with much fewer diffusion steps on several I2I benchmarks.
arXiv Detail & Related papers (2023-08-04T06:21:57Z) - An Efficient Membership Inference Attack for the Diffusion Model by
Proximal Initialization [58.88327181933151]
In this paper, we propose an efficient query-based membership inference attack (MIA)
Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models.
To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the text-to-speech task.
arXiv Detail & Related papers (2023-05-26T16:38:48Z) - An Adaptive Plug-and-Play Network for Few-Shot Learning [12.023266104119289]
Few-shot learning requires a model to classify new samples after learning from only a few samples.
Deep networks and complex metrics tend to induce overfitting, making it difficult to further improve the performance.
We propose plug-and-play model-adaptive resizer (MAR) and adaptive similarity metric (ASM) without any other losses.
arXiv Detail & Related papers (2023-02-18T13:25:04Z) - Feature Re-calibration based MIL for Whole Slide Image Classification [7.92885032436243]
Whole slide image (WSI) classification is a fundamental task for the diagnosis and treatment of diseases.
We propose to re-calibrate the distribution of a WSI bag (instances) by using the statistics of the max-instance (critical) feature.
We employ a position encoding module (PEM) to model spatial/morphological information, and perform pooling by multi-head self-attention (PSMA) with a Transformer encoder.
arXiv Detail & Related papers (2022-06-22T07:00:39Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Learning to Perform Downlink Channel Estimation in Massive MIMO Systems [72.76968022465469]
We study downlink (DL) channel estimation in a Massive multiple-input multiple-output (MIMO) system.
A common approach is to use the mean value as the estimate, motivated by channel hardening.
We propose two novel estimation methods.
arXiv Detail & Related papers (2021-09-06T13:42:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.