Related papers: Parallel Sampling via Autospeculation

Parallel Sampling via Autospeculation

URL: http://arxiv.org/abs/2511.07869v1
Date: Wed, 12 Nov 2025 01:25:00 GMT
Title: Parallel Sampling via Autospeculation
Authors: Nima Anari, Carlo Baronio, CJ Chen, Alireza Haqi, Frederic Koehler, Anqi Li, Thuy-Duong Vuong,
Abstract summary: We present algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models.<n>We show that, by issuing oracle calls in parallel, the expected sampling time can be reduced to $widetildeO(n1/2)$.<n>We introduce a novel technique to obtain our results: speculative rejection sampling.
Score: 13.643401888306398
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present parallel algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models. An any-order autoregressive model accesses a target distribution $μ$ on $[q]^n$ through an oracle that provides conditional marginals, while a denoising diffusion model accesses a target distribution $μ$ on $\mathbb{R}^n$ through an oracle that provides conditional means under Gaussian noise. Standard sequential sampling algorithms require $\widetilde{O}(n)$ time to produce a sample from $μ$ in either setting. We show that, by issuing oracle calls in parallel, the expected sampling time can be reduced to $\widetilde{O}(n^{1/2})$. This improves the previous $\widetilde{O}(n^{2/3})$ bound for any-order autoregressive models and yields the first parallel speedup for diffusion models in the high-accuracy regime, under the relatively mild assumption that the support of $μ$ is bounded. We introduce a novel technique to obtain our results: speculative rejection sampling. This technique leverages an auxiliary ``speculative'' distribution~$ν$ that approximates~$μ$ to accelerate sampling. Our technique is inspired by the well-studied ``speculative decoding'' techniques popular in large language models, but differs in key ways. Firstly, we use ``autospeculation,'' namely we build the speculation $ν$ out of the same oracle that defines~$μ$. In contrast, speculative decoding typically requires a separate, faster, but potentially less accurate ``draft'' model $ν$. Secondly, the key differentiating factor in our technique is that we make and accept speculations at a ``sequence'' level rather than at the level of single (or a few) steps. This last fact is key to unlocking our parallel runtime of $\widetilde{O}(n^{1/2})$.

Related papers

Constrained and Composite Sampling via Proximal Sampler [2.087898608419977]
We study two log-concave sampling problems: constrained sampling and composite sampling.<n>The main challenge is enforcing feasibility without degrading mixing.<n>In composite sampling, the target is proportional to $exp(-f(x)-h(x))$ with closed and convex $f$ and $h$.
arXiv Detail & Related papers (2026-02-16T05:36:36Z)
Posterior Sampling by Combining Diffusion Models with Annealed Langevin Dynamics [6.987640034932562]
Posterior sampling provides an accurate and fair framework for tasks such as inpainting, deblurring, and MRI reconstruction.<n>We prove that combining diffusion models with an variant of Langevin dynamics achieves conditional sampling in time using merely an $L4$ bound on the score error.
arXiv Detail & Related papers (2025-10-30T10:17:27Z)
DISCO: Diversifying Sample Condensation for Efficient Model Evaluation [59.01400190971061]
Costly evaluation reduces inclusivity, slows the cycle of innovation, and worsens environmental impact.<n>We argue that promoting diversity among samples is not essential; what matters is to select samples thatmaximise diversity in model responses.<n>Our method, $textbfDiversifying Sample Condensation (DISCO)$, selects the top-k samples with the greatest model disagreements.
arXiv Detail & Related papers (2025-10-09T08:53:59Z)
Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [72.69498649272347]
conditional distributions is a central problem in machine learning.<n>We propose a new paradigm that integrates both paired and unpaired data.<n>We show that our approach can theoretically recover true conditional distributions with arbitrarily small error.
arXiv Detail & Related papers (2024-10-03T16:12:59Z)
Parallel Sampling via Counting [3.6854430278041264]
We show how to use parallelization to speed up sampling from an arbitrary distribution. Our algorithm takes $O(n2/3cdot operatornamepolylog(n,q))$ parallel time. Our results have implications for sampling in autoregressive models.
arXiv Detail & Related papers (2024-08-18T11:42:54Z)
Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel [10.840582511203024]
We show that our algorithm can be parallelized to run in only $widetilde O(log2 d)$ parallel rounds. We also show that our algorithm can be parallelized to run in only $widetilde O(log2 d)$ parallel rounds.
arXiv Detail & Related papers (2024-06-03T01:34:34Z)
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z)
Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems.<n>We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z)
Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence [3.9586758145580014]
We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions. For a graph $G=(V, E)$, we show how to approximately sample uniformly random spanning trees from $G$ in $widetildeO(lvert Vrvert)$ time per sample. For a determinantal point process on subsets of size $k$ of a ground set of $n$ elements, we show how to approximately sample in $widetildeO(komega)$ time after an initial $widetildeO(nk
arXiv Detail & Related papers (2022-04-06T04:11:26Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$ We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error. We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.