Related papers: Understanding Flatness in Generative Models: Its Role and Benefits

Understanding Flatness in Generative Models: Its Role and Benefits

URL: http://arxiv.org/abs/2503.11078v1
Date: Fri, 14 Mar 2025 04:38:53 GMT
Title: Understanding Flatness in Generative Models: Its Role and Benefits
Authors: Taehwan Lee, Kyeongkook Seo, Jaejun Yoo, Sung Whan Yoon,
Abstract summary: We investigate the role of loss surface flatness in generative models, both theoretically and empirically.<n>We establish a theoretical claim that flatter minima improve robustness against perturbations in target prior distributions.<n>We demonstrate that flat minima in diffusion models indeed improves generative performance but also robustness.
Score: 9.775257597631244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Flat minima, known to enhance generalization and robustness in supervised learning, remain largely unexplored in generative models. In this work, we systematically investigate the role of loss surface flatness in generative models, both theoretically and empirically, with a particular focus on diffusion models. We establish a theoretical claim that flatter minima improve robustness against perturbations in target prior distributions, leading to benefits such as reduced exposure bias -- where errors in noise estimation accumulate over iterations -- and significantly improved resilience to model quantization, preserving generative performance even under strong quantization constraints. We further observe that Sharpness-Aware Minimization (SAM), which explicitly controls the degree of flatness, effectively enhances flatness in diffusion models, whereas other well-known methods such as Stochastic Weight Averaging (SWA) and Exponential Moving Average (EMA), which promote flatness indirectly via ensembling, are less effective. Through extensive experiments on CIFAR-10, LSUN Tower, and FFHQ, we demonstrate that flat minima in diffusion models indeed improves not only generative performance but also robustness.

Related papers

Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design [53.93023688824764]
We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design.<n>We propose an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions.<n>Our off-policy formulation, combined with KL divergence minimization, enhances training stability and sample efficiency compared to existing RL-based methods.
arXiv Detail & Related papers (2025-07-01T05:55:28Z)
Weight Spectra Induced Efficient Model Adaptation [54.8615621415845]
Fine-tuning large-scale foundation models incurs prohibitive computational costs.<n>We show that fine-tuning predominantly amplifies the top singular values while leaving the remainder largely intact.<n>We propose a novel method that leverages learnable rescaling of top singular directions.
arXiv Detail & Related papers (2025-05-29T05:03:29Z)
Adversarial Transferability in Deep Denoising Models: Theoretical Insights and Robustness Enhancement via Out-of-Distribution Typical Set Sampling [6.189440665620872]
Deep learning-based image denoising models demonstrate remarkable performance, but their lack of robustness analysis remains a significant concern.<n>A major issue is that these models are susceptible to adversarial attacks, where small, carefully crafted perturbations to input data can cause them to fail.<n>We propose a novel adversarial defense method: the Out-of-Distribution Typical Set Sampling Training strategy.
arXiv Detail & Related papers (2024-12-08T13:47:57Z)
Generalized Diffusion Model with Adjusted Offset Noise [1.7767466724342067]
We propose a generalized diffusion model that naturally incorporates additional noise within a rigorous probabilistic framework. We derive a loss function based on the evidence lower bound, establishing its theoretical equivalence to offset noise with certain adjustments. Experiments on synthetic datasets demonstrate that our model effectively addresses brightness-related challenges and outperforms conventional methods in high-dimensional scenarios.
arXiv Detail & Related papers (2024-12-04T08:57:03Z)
Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models. This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution. We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z)
Boosting Adversarial Transferability with Low-Cost Optimization via Maximin Expected Flatness [12.70766510980057]
Transfer-based attacks craft adversarial examples on surrogate models and deploy them against black-box target models, offering model-agnostic and query-free threat scenarios.<n>While flatness-enhanced methods have recently emerged to improve transferability, their divergent flatness definitions and attack designs suffer from unexamined optimization limitations and missing theoretical foundation.<n>This work exposes the severely imbalanced exploitation-exploration dynamics in flatness optimization, establishing the first theoretical foundation for flatness-based transferability.
arXiv Detail & Related papers (2024-05-25T11:16:53Z)
Physics-Informed Diffusion Models [0.0]
We present a framework that unifies generative modeling and partial differential equation fulfillment.<n>Our approach reduces the residual error by up to two orders of magnitude compared to previous work in a fluid flow case study.
arXiv Detail & Related papers (2024-03-21T13:52:55Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Unmasking Bias in Diffusion Model Training [40.90066994983719]
Denoising diffusion models have emerged as a dominant approach for image generation. They still suffer from slow convergence in training and color shift issues in sampling. In this paper, we identify that these obstacles can be largely attributed to bias and suboptimality inherent in the default training paradigm.
arXiv Detail & Related papers (2023-10-12T16:04:41Z)
Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion Models [76.46246743508651]
We show that current diffusion models actually have an expressive bottleneck in backward denoising. We introduce soft mixture denoising (SMD), an expressive and efficient model for backward denoising.
arXiv Detail & Related papers (2023-09-25T12:03:32Z)
DiffLLE: Diffusion-guided Domain Calibration for Unsupervised Low-light Image Enhancement [21.356254176992937]
Existing unsupervised low-light image enhancement methods lack enough effectiveness and generalization in practical applications. We develop Diffusion-based domain calibration to realize more robust and effective unsupervised Low-Light Enhancement, called DiffLLE. Our approach even outperforms some supervised methods by using only a simple unsupervised baseline.
arXiv Detail & Related papers (2023-08-18T03:40:40Z)
Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration. We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z)
Perimeter Control Using Deep Reinforcement Learning: A Model-free Approach towards Homogeneous Flow Rate Optimization [28.851432612392436]
Perimeter control maintains high traffic efficiency within protected regions by controlling transfer flows among regions to ensure that their traffic densities are below critical values. Existing approaches can be categorized as either model-based or model-free, depending on whether they rely on network transmission models (NTMs) and macroscopic fundamental diagrams (MFDs)
arXiv Detail & Related papers (2023-05-29T21:22:08Z)
Diffusion Models are Minimax Optimal Distribution Estimators [49.47503258639454]
We provide the first rigorous analysis on approximation and generalization abilities of diffusion modeling. We show that when the true density function belongs to the Besov space and the empirical score matching loss is properly minimized, the generated data distribution achieves the nearly minimax optimal estimation rates.
arXiv Detail & Related papers (2023-03-03T11:31:55Z)
How Much is Enough? A Study on Diffusion Times in Score-based Generative Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution. We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.