Ctrl-Z Sampling: Diffusion Sampling with Controlled Random Zigzag Explorations
- URL: http://arxiv.org/abs/2506.20294v2
- Date: Wed, 06 Aug 2025 13:36:10 GMT
- Title: Ctrl-Z Sampling: Diffusion Sampling with Controlled Random Zigzag Explorations
- Authors: Shunqi Mao, Wei Guo, Chaoyi Zhang, Jieting Long, Ke Xie, Weidong Cai,
- Abstract summary: We propose a novel sampling strategy that adaptively detects and escapes steep local maxima.<n>Ctrl-Z Sampling substantially improves generation quality with only around 6.72x increase in the number of function evaluations.
- Score: 14.543484922782751
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have shown strong performance in conditional generation by progressively denoising Gaussian samples toward a target data distribution. This denoising process can be interpreted as a form of hill climbing in a learned latent space, where the model iteratively refines a sample toward regions of higher probability. However, this learned climbing often converges to local optima with plausible but suboptimal generations due to latent space complexity and suboptimal initialization. While prior efforts often strengthen guidance signals or introduce fixed exploration strategies to address this, they exhibit limited capacity to escape steep local maxima. In contrast, we propose Controlled Random Zigzag Sampling (Ctrl-Z Sampling), a novel sampling strategy that adaptively detects and escapes such traps through controlled exploration. In each diffusion step, we first identify potential local maxima using a reward model. Upon such detection, we inject noise and revert to a previous, noisier state to escape the current plateau. The reward model then evaluates candidate trajectories, accepting only those that offer improvement, otherwise scheming progressively deeper explorations when nearby alternatives fail. This controlled zigzag process allows dynamic alternation between forward refinement and backward exploration, enhancing both alignment and visual quality in the generated outputs. The proposed method is model-agnostic and also compatible with existing diffusion frameworks. Experimental results show that Ctrl-Z Sampling substantially improves generation quality with only around 6.72x increase in the number of function evaluations.
Related papers
- Noise Conditional Variational Score Distillation [60.38982038894823]
Noise Conditional Variational Score Distillation (NCVSD) is a novel method for distilling pretrained diffusion models into generative denoisers.<n>By integrating this insight into the Variational Score Distillation framework, we enable scalable learning of generative denoisers.
arXiv Detail & Related papers (2025-06-11T06:01:39Z) - Adaptive Destruction Processes for Diffusion Samplers [12.446080077998834]
This paper explores the challenges and benefits of a trainable destruction process in diffusion samplers.<n>We show that, when the number of steps is limited, training both generation and destruction processes results in faster convergence and improved sampling quality.
arXiv Detail & Related papers (2025-06-02T11:07:27Z) - A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models [3.8623569699070357]
Noise PPO is a minimalist reinforcement learning algorithm that learns a prompt-conditioned initial noise generator.<n>Experiments show that Noise PPO consistently improves alignment and sample quality over the original model.<n>These findings reinforce the practical value of minimalist RL fine-tuning for diffusion models.
arXiv Detail & Related papers (2025-05-23T00:01:52Z) - Quantizing Diffusion Models from a Sampling-Aware Perspective [43.95032520555463]
We propose a sampling-aware quantization strategy, wherein a Mixed-Order Trajectory Alignment technique is devised.<n>Experiments on sparse-step fast sampling across multiple datasets demonstrate that our approach preserves the rapid convergence characteristics of high-speed samplers.
arXiv Detail & Related papers (2025-05-04T20:50:44Z) - Distributional Diffusion Models with Scoring Rules [83.38210785728994]
Diffusion models generate high-quality synthetic data.<n> generating high-quality outputs requires many discretization steps.<n>We propose to accomplish sample generation by learning the posterior em distribution of clean data samples.
arXiv Detail & Related papers (2025-02-04T16:59:03Z) - Arbitrary-steps Image Super-resolution via Diffusion Inversion [68.78628844966019]
This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance.<n>We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point.<n>Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result.
arXiv Detail & Related papers (2024-12-12T07:24:13Z) - Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements [45.70011319850862]
Diffusion models have emerged as a powerful foundation model for visual generation.
Current posterior sampling based methods take the measurement into the posterior sampling to infer the distribution of the target data.
We show that high-frequency information can be prematurely introduced during the early stages, which could induce larger posterior estimate errors.
We propose a novel diffusion posterior sampling method DPS-CM, which incorporates a Crafted Measurement.
arXiv Detail & Related papers (2024-11-15T00:06:57Z) - Posterior sampling via Langevin dynamics based on generative priors [31.84543941736757]
Posterior sampling in high-dimensional spaces using generative models holds significant promise for various applications.
Existing methods require restarting the entire generative process for each new sample, making the procedure computationally expensive.
We propose efficient posterior sampling by simulating Langevin dynamics in the noise space of a pre-trained generative model.
arXiv Detail & Related papers (2024-10-02T22:57:47Z) - Adaptive teachers for amortized samplers [76.88721198565861]
We propose an adaptive training distribution (the teacher) to guide the training of the primary amortized sampler (the student)<n>We validate the effectiveness of this approach in a synthetic environment designed to present an exploration challenge.
arXiv Detail & Related papers (2024-10-02T11:33:13Z) - Score-based Generative Models with Adaptive Momentum [40.84399531998246]
We propose an adaptive momentum sampling method to accelerate the transforming process.
We show that our method can produce more faithful images/graphs in small sampling steps with 2 to 5 times speed up.
arXiv Detail & Related papers (2024-05-22T15:20:27Z) - DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for
Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space.
We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process.
Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z) - PCB-RandNet: Rethinking Random Sampling for LIDAR Semantic Segmentation
in Autonomous Driving Scene [15.516687293651795]
We propose a new Polar Cylinder Balanced Random Sampling method for semantic segmentation of large-scale LiDAR point clouds.
In addition, a sampling consistency loss is introduced to further improve the segmentation performance and reduce the model's variance under different sampling methods.
Our approach produces excellent performance on both SemanticKITTI and SemanticPOSS benchmarks, achieving a 2.8% and 4.0% improvement, respectively.
arXiv Detail & Related papers (2022-09-28T02:59:36Z) - Diverse Human Motion Prediction via Gumbel-Softmax Sampling from an
Auxiliary Space [34.83587750498361]
Diverse human motion prediction aims at predicting multiple possible future pose sequences from a sequence of observed poses.
Previous approaches usually employ deep generative networks to model the conditional distribution of data, and then randomly sample outcomes from the distribution.
We propose a novel sampling strategy for sampling very diverse results from an imbalanced multimodal distribution.
arXiv Detail & Related papers (2022-07-15T09:03:57Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.