A First-order Generative Bilevel Optimization Framework for Diffusion Models
- URL: http://arxiv.org/abs/2502.08808v1
- Date: Wed, 12 Feb 2025 21:44:06 GMT
- Title: A First-order Generative Bilevel Optimization Framework for Diffusion Models
- Authors: Quan Xiao, Hui Yuan, A F M Saif, Gaowen Liu, Ramana Kompella, Mengdi Wang, Tianyi Chen,
- Abstract summary: Diffusion models iteratively denoise data samples to synthesize high-quality outputs.
Traditional bilevel methods fail due to infinite-dimensional probability space and prohibitive sampling costs.
We formalize this challenge as a generative bilevel optimization problem.
Our first-order bilevel framework overcomes the incompatibility of conventional bilevel methods with diffusion processes.
- Score: 57.40597004445473
- License:
- Abstract: Diffusion models, which iteratively denoise data samples to synthesize high-quality outputs, have achieved empirical success across domains. However, optimizing these models for downstream tasks often involves nested bilevel structures, such as tuning hyperparameters for fine-tuning tasks or noise schedules in training dynamics, where traditional bilevel methods fail due to the infinite-dimensional probability space and prohibitive sampling costs. We formalize this challenge as a generative bilevel optimization problem and address two key scenarios: (1) fine-tuning pre-trained models via an inference-only lower-level solver paired with a sample-efficient gradient estimator for the upper level, and (2) training diffusion models from scratch with noise schedule optimization by reparameterizing the lower-level problem and designing a computationally tractable gradient estimator. Our first-order bilevel framework overcomes the incompatibility of conventional bilevel methods with diffusion processes, offering theoretical grounding and computational practicality. Experiments demonstrate that our method outperforms existing fine-tuning and hyperparameter search baselines.
Related papers
- Fast T2T: Optimization Consistency Speeds Up Diffusion-Based Training-to-Testing Solving for Combinatorial Optimization [83.65278205301576]
We propose to learn direct mappings from different noise levels to the optimal solution for a given instance, facilitating high-quality generation with minimal shots.
This is achieved through an optimization consistency training protocol, which minimizes the difference among samples.
Experiments on two popular tasks, the Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS), demonstrate the superiority of Fast T2T regarding both solution quality and efficiency.
arXiv Detail & Related papers (2025-02-05T07:13:43Z) - Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms [31.42317398879432]
Current inference approaches mainly fall into two categories: exact simulation and approximate methods such as $tau$-leaping.
In this work, we advance the latter category by tailoring the first extension of high-order numerical inference schemes to discrete diffusion models.
We rigorously analyze the proposed schemes and establish the second-order accuracy of the $theta$-trapezoidal method in KL divergence.
arXiv Detail & Related papers (2025-02-01T00:25:21Z) - Beyond Fixed Horizons: A Theoretical Framework for Adaptive Denoising Diffusions [1.9116784879310031]
We introduce a new class of generative diffusion models that achieve a time-homogeneous structure for both the noising and denoising processes.
A key feature of the model is its adaptability to the target data, enabling a variety of downstream tasks using a pre-trained unconditional generative model.
arXiv Detail & Related papers (2025-01-31T18:23:27Z) - Bilevel Learning with Inexact Stochastic Gradients [2.247833425312671]
Bilevel learning has gained prominence in machine learning, inverse problems, and imaging applications.
The large-scale nature of these problems has led to the development of inexact and computationally efficient methods.
arXiv Detail & Related papers (2024-12-16T18:18:47Z) - CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems [3.3969056208620128]
We propose to push the boundary of inference steps to 1-2 NFEs while still maintaining high reconstruction quality.
Our method achieves new state-of-the-art in diffusion-based inverse problem solving.
arXiv Detail & Related papers (2024-07-17T15:57:50Z) - GRAWA: Gradient-based Weighted Averaging for Distributed Training of
Deep Learning Models [9.377424534371727]
We study distributed training of deep models in time-constrained environments.
We propose a new algorithm that periodically pulls workers towards the center variable computed as an average of workers.
arXiv Detail & Related papers (2024-03-07T04:22:34Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling.
We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models.
NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.