Related papers: Learnable Chernoff Baselines for Inference-Time Alignment

Learnable Chernoff Baselines for Inference-Time Alignment

URL: http://arxiv.org/abs/2602.07738v2
Date: Fri, 13 Feb 2026 18:15:21 GMT
Title: Learnable Chernoff Baselines for Inference-Time Alignment
Authors: Sunil Madhow, Yuchen Liang, Ness Shroff, Yingbin Liang, Yu-Xiang Wang,
Abstract summary: We introduce Learnable Chernoff Baselines as a method for efficiently and approximately sampling from exponentially tilted kernels.<n>We establish total-variation guarantees to the ideal aligned model, and demonstrate in both continuous and discrete diffusion settings that LCB sampling closely matches ideal rejection sampling.
Score: 64.81256817158851
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study inference-time reward-guided alignment for generative models. Existing methods often rely on either architecture-specific adaptations or computationally costly inference procedures. We introduce Learnable Chernoff Baselines (LCBs) as a method for efficiently and approximately sampling from the exponentially tilted kernels that arise from KL-regularized reward alignment. Using only black-box sampling access to the pretrained model, LCBs implement a form of rejection sampling with adaptively selected acceptance probabilities, which allows fine-grained control over inference-compute scaling. We establish total-variation guarantees to the ideal aligned model, and demonstrate in both continuous and discrete diffusion settings that LCB sampling closely matches ideal rejection sampling while using substantially fewer queries to the pretrained model.

Related papers

PDAC: Efficient Coreset Selection for Continual Learning via Probability Density Awareness [19.191960069245354]
Rehearsal-based Continual Learning (CL) maintains a limited memory buffer to store replay samples for knowledge retention.<n>Current Rehearsal-based CL methods typically construct the memory buffer by selecting a representative subset.<n>We propose the Probability Density-Aware Coreset (PDAC) method to estimate each sample's joint density, enabling efficient density-prioritized buffer selection.
arXiv Detail & Related papers (2025-11-12T17:00:21Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Training-Free Stein Diffusion Guidance: Posterior Correction for Sampling Beyond High-Density Regions [46.59494117137471]
Training free diffusion guidance provides a flexible way to leverage off-the-shelf classifiers without additional training.<n>We introduce Stein Diffusion Guidance (SDG), a novel training-free framework grounded in a surrogate SOC objective.<n>Experiments on molecular low-density sampling tasks suggest that SDG consistently surpasses standard training-free guidance methods.
arXiv Detail & Related papers (2025-07-07T21:14:27Z)
Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts [64.34482582690927]
We provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models.<n>We propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality.
arXiv Detail & Related papers (2025-03-04T17:46:51Z)
A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization [23.972397132797116]
Current deep learning approaches rely on generative models that yield exact sample likelihoods.<n>This work introduces a method that lifts this restriction and opens the possibility to employ highly expressive latent variable models.<n>We experimentally validate our approach in data-free Combinatorial Optimization and demonstrate that our method achieves a new state-of-the-art on a wide range of benchmark problems.
arXiv Detail & Related papers (2024-06-03T17:55:02Z)
Entropy-MCMC: Sampling from Flat Basins with Ease [10.764160559530849]
We introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins. By integrating this guiding variable with the model parameter, we create a simple joint distribution that enables efficient sampling with minimal computational overhead. Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks.
arXiv Detail & Related papers (2023-10-09T04:40:20Z)
Protein Design with Guided Discrete Diffusion [67.06148688398677]
A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. We propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods.
arXiv Detail & Related papers (2023-05-31T16:31:24Z)
Bayesian Pseudo-Coresets via Contrastive Divergence [5.479797073162603]
We introduce a novel approach for constructing pseudo-coresets by utilizing contrastive divergence. It eliminates the need for approximations in the pseudo-coreset construction process. We conduct extensive experiments on multiple datasets, demonstrating its superiority over existing BPC techniques.
arXiv Detail & Related papers (2023-03-20T17:13:50Z)
Simplex Clustering via sBeta with Applications to Online Adjustment of Black-Box Predictions [16.876111500144667]
We introduce a novel probabilistic clustering method, referred to as k-sBetas. We provide a general maximum a posteriori (MAP) perspective of clustering distributions. Our code and comparisons with the existing simplex-clustering approaches and our introduced softmax-prediction benchmarks are publicly available.
arXiv Detail & Related papers (2022-07-30T18:29:11Z)
Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning [78.30395044401321]
We develop a novel model-based approach to reinforcement learning (MBRL) It relaxes the assumptions on the target transition model to belong to a generic family of mixture models. It can achieve up-to 50 percent reduction in wall clock time in some continuous control environments.
arXiv Detail & Related papers (2022-06-02T17:27:49Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.