Ctrl-Z Sampling: Diffusion Sampling with Controlled Random Zigzag Explorations
- URL: http://arxiv.org/abs/2506.20294v3
- Date: Fri, 26 Sep 2025 05:52:12 GMT
- Title: Ctrl-Z Sampling: Diffusion Sampling with Controlled Random Zigzag Explorations
- Authors: Shunqi Mao, Wei Guo, Chaoyi Zhang, Jieting Long, Ke Xie, Weidong Cai,
- Abstract summary: We propose a novel sampling strategy that adaptively detects and escapes steep local maxima.<n>We show that Ctrl-Z Sampling substantially improves generation quality while requiring only about 7.72 times the NFEs of the original.
- Score: 17.357140159249496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have shown strong performance in conditional generation by progressively denoising Gaussian samples toward a target data distribution. This denoising process can be interpreted as a form of hill climbing in a learned representation space, where the model iteratively refines a sample toward regions of higher probability. However, this learned climbing often converges to local optima with plausible but suboptimal generations due to latent space complexity and suboptimal initialization. While prior efforts often strengthen guidance signals or introduce fixed exploration strategies to address this, they exhibit limited capacity to escape steep local maxima. In contrast, we propose Controlled Random Zigzag Sampling (Ctrl-Z Sampling), a novel sampling strategy that adaptively detects and escapes such traps through controlled exploration. In each diffusion step, we first identify potential local maxima using a reward model. Upon such detection, we inject noise and revert to a previous, noisier state to escape the current plateau. The reward model then evaluates candidate trajectories, accepting only those that offer improvement, otherwise scheming progressively deeper explorations when nearby alternatives fail. This controlled zigzag process allows dynamic alternation between forward refinement and backward exploration, enhancing both alignment and visual quality in the generated outputs. The proposed method is model-agnostic and also compatible with existing diffusion frameworks. Experimental results show that Ctrl-Z Sampling substantially improves generation quality while requiring only about 7.72 times the NFEs of the original.
Related papers
- Rethinking Refinement: Correcting Generative Bias without Noise Injection [7.28668585578288]
Generative models, including diffusion and flow-based models, often exhibit systematic biases that degrade sample quality.<n>We show that effective bias correction can be achieved as a post-hoc procedure, without noise injection or multi-step resampling.
arXiv Detail & Related papers (2026-01-29T02:34:08Z) - Inference-Time Scaling for Visual AutoRegressive modeling by Searching Representative Samples [8.364449021192016]
We introduce VAR-Scaling, the first general framework for inference-time scaling in visual autoregressive modeling (VQ)<n>We map sampling spaces to quasi-continuous feature spaces via kernel density estimation (KDE), where high-density samples approximate stable, high-quality solutions.<n> Experiments in class-conditional and text-to-image evaluations demonstrate significant improvements in inference process.
arXiv Detail & Related papers (2026-01-12T08:00:55Z) - Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling [82.52485740425321]
Adrial attacks present a critical challenge to deep neural networks' robustness.<n> transferability of adversarial attacks faces a dilemma between Exploitation (maximizing attack potency) and Exploration (enhancing cross-model generalization)
arXiv Detail & Related papers (2025-11-01T05:43:47Z) - G$^2$RPO: Granular GRPO for Precise Reward in Flow Models [74.21206048155669]
We propose a novel Granular-GRPO (G$2$RPO) framework that achieves precise and comprehensive reward assessments of sampling directions.<n>We introduce a Multi-Granularity Advantage Integration module that aggregates advantages computed at multiple diffusion scales.<n>Our G$2$RPO significantly outperforms existing flow-based GRPO baselines.
arXiv Detail & Related papers (2025-10-02T12:57:12Z) - Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z) - Noise Conditional Variational Score Distillation [60.38982038894823]
Noise Conditional Variational Score Distillation (NCVSD) is a novel method for distilling pretrained diffusion models into generative denoisers.<n>By integrating this insight into the Variational Score Distillation framework, we enable scalable learning of generative denoisers.
arXiv Detail & Related papers (2025-06-11T06:01:39Z) - Adaptive Destruction Processes for Diffusion Samplers [12.446080077998834]
This paper explores the challenges and benefits of a trainable destruction process in diffusion samplers.<n>We show that, when the number of steps is limited, training both generation and destruction processes results in faster convergence and improved sampling quality.
arXiv Detail & Related papers (2025-06-02T11:07:27Z) - A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models [3.8623569699070357]
Noise PPO is a minimalist reinforcement learning algorithm that learns a prompt-conditioned initial noise generator.<n>Experiments show that Noise PPO consistently improves alignment and sample quality over the original model.<n>These findings reinforce the practical value of minimalist RL fine-tuning for diffusion models.
arXiv Detail & Related papers (2025-05-23T00:01:52Z) - Quantizing Diffusion Models from a Sampling-Aware Perspective [43.95032520555463]
We propose a sampling-aware quantization strategy, wherein a Mixed-Order Trajectory Alignment technique is devised.<n>Experiments on sparse-step fast sampling across multiple datasets demonstrate that our approach preserves the rapid convergence characteristics of high-speed samplers.
arXiv Detail & Related papers (2025-05-04T20:50:44Z) - Distributional Diffusion Models with Scoring Rules [83.38210785728994]
Diffusion models generate high-quality synthetic data.<n> generating high-quality outputs requires many discretization steps.<n>We propose to accomplish sample generation by learning the posterior em distribution of clean data samples.
arXiv Detail & Related papers (2025-02-04T16:59:03Z) - Arbitrary-steps Image Super-resolution via Diffusion Inversion [68.78628844966019]
This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance.<n>We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point.<n>Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result.
arXiv Detail & Related papers (2024-12-12T07:24:13Z) - Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements [45.70011319850862]
Diffusion models have emerged as a powerful foundation model for visual generation.
Current posterior sampling based methods take the measurement into the posterior sampling to infer the distribution of the target data.
We show that high-frequency information can be prematurely introduced during the early stages, which could induce larger posterior estimate errors.
We propose a novel diffusion posterior sampling method DPS-CM, which incorporates a Crafted Measurement.
arXiv Detail & Related papers (2024-11-15T00:06:57Z) - Posterior sampling via Langevin dynamics based on generative priors [31.84543941736757]
Posterior sampling in high-dimensional spaces using generative models holds significant promise for various applications.
Existing methods require restarting the entire generative process for each new sample, making the procedure computationally expensive.
We propose efficient posterior sampling by simulating Langevin dynamics in the noise space of a pre-trained generative model.
arXiv Detail & Related papers (2024-10-02T22:57:47Z) - Adaptive teachers for amortized samplers [76.88721198565861]
We propose an adaptive training distribution (the teacher) to guide the training of the primary amortized sampler (the student)<n>We validate the effectiveness of this approach in a synthetic environment designed to present an exploration challenge.
arXiv Detail & Related papers (2024-10-02T11:33:13Z) - Score-based Generative Models with Adaptive Momentum [40.84399531998246]
We propose an adaptive momentum sampling method to accelerate the transforming process.
We show that our method can produce more faithful images/graphs in small sampling steps with 2 to 5 times speed up.
arXiv Detail & Related papers (2024-05-22T15:20:27Z) - DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for
Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space.
We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process.
Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z) - PCB-RandNet: Rethinking Random Sampling for LIDAR Semantic Segmentation
in Autonomous Driving Scene [15.516687293651795]
We propose a new Polar Cylinder Balanced Random Sampling method for semantic segmentation of large-scale LiDAR point clouds.
In addition, a sampling consistency loss is introduced to further improve the segmentation performance and reduce the model's variance under different sampling methods.
Our approach produces excellent performance on both SemanticKITTI and SemanticPOSS benchmarks, achieving a 2.8% and 4.0% improvement, respectively.
arXiv Detail & Related papers (2022-09-28T02:59:36Z) - Diverse Human Motion Prediction via Gumbel-Softmax Sampling from an
Auxiliary Space [34.83587750498361]
Diverse human motion prediction aims at predicting multiple possible future pose sequences from a sequence of observed poses.
Previous approaches usually employ deep generative networks to model the conditional distribution of data, and then randomly sample outcomes from the distribution.
We propose a novel sampling strategy for sampling very diverse results from an imbalanced multimodal distribution.
arXiv Detail & Related papers (2022-07-15T09:03:57Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.