Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models
- URL: http://arxiv.org/abs/2506.01320v3
- Date: Mon, 27 Oct 2025 15:22:41 GMT
- Title: Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models
- Authors: Taehoon Yoon, Yunhong Min, Kyeongmin Yeo, Minhyuk Sung,
- Abstract summary: $Psi$-Sampler is an SMC-based framework incorporating pCNL-based initial particle sampling.<n>Inference-time reward alignment with score-based generative models has gained significant traction.
- Score: 26.211711150915203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce $\Psi$-Sampler, an SMC-based framework incorporating pCNL-based initial particle sampling for effective inference-time reward alignment with a score-based generative model. Inference-time reward alignment with score-based generative models has recently gained significant traction, following a broader paradigm shift from pre-training to post-training optimization. At the core of this trend is the application of Sequential Monte Carlo (SMC) to the denoising process. However, existing methods typically initialize particles from the Gaussian prior, which inadequately captures reward-relevant regions and results in reduced sampling efficiency. We demonstrate that initializing from the reward-aware posterior significantly improves alignment performance. To enable posterior sampling in high-dimensional latent spaces, we introduce the preconditioned Crank-Nicolson Langevin (pCNL) algorithm, which combines dimension-robust proposals with gradient-informed dynamics. This approach enables efficient and scalable posterior sampling and consistently improves performance across various reward alignment tasks, including layout-to-image generation, quantity-aware generation, and aesthetic-preference generation, as demonstrated in our experiments. Project Webpage: https://psi-sampler.github.io/
Related papers
- Learnable Chernoff Baselines for Inference-Time Alignment [64.81256817158851]
We introduce Learnable Chernoff Baselines as a method for efficiently and approximately sampling from exponentially tilted kernels.<n>We establish total-variation guarantees to the ideal aligned model, and demonstrate in both continuous and discrete diffusion settings that LCB sampling closely matches ideal rejection sampling.
arXiv Detail & Related papers (2026-02-08T00:09:40Z) - DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment [49.45064510462232]
GRPO-based approaches for text-to-image generation suffer from the sparse reward problem.<n>We introduce textbfDenseGRPO, a novel framework that aligns human preference with dense rewards.
arXiv Detail & Related papers (2026-01-28T03:39:05Z) - Sublinear iterations can suffice even for DDPMs [20.394390032998036]
We show that a "shifted composition rule" algorithm enjoys favorable discretization properties under appropriate assumptions.<n>This is the first sublinear complexity bound for pure DDPM sampling.<n>We also provide experimental validation of the advantages of our method, showing that it performs well in practice with pre-trained image synthesis models.
arXiv Detail & Related papers (2025-11-06T22:15:19Z) - Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space [15.104280833614157]
We propose a new test-time alignment method called adaptive importance sampling on pre-logits (AISP)<n>AISP applies the perturbation into pre-logits, which are outputs of the penultimate layer, so as to maximize expected rewards with respect to the mean of the perturbation.<n>AISP outperforms best-of-n sampling in terms of rewards over the number of used samples and achieves higher rewards than other reward-based test-time alignment methods.
arXiv Detail & Related papers (2025-10-30T07:52:14Z) - Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z) - Test-Time Alignment of Discrete Diffusion Models with Sequential Monte Carlo [19.81513273510523]
We propose a training-free method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution at the test time.<n>Our approach leverages twisted SMC with an approximate locally optimal proposal, obtained via a first-order Taylor expansion of the reward function.<n>To address the challenge of ill-defined gradients in discrete spaces, we incorporate a Gumbel-Softmax relaxation, enabling efficient gradient-based approximation within the discrete generative framework.
arXiv Detail & Related papers (2025-05-28T16:12:03Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization [66.67988187816185]
We aim to emphscale up the number of on-policy samples via repeated random sampling to improve alignment performance.<n>Our experiments reveal that this strategy leads to a emphdecline in performance as the sample size increases.<n>We introduce a scalable preference data construction strategy that consistently enhances model performance as the sample scale increases.
arXiv Detail & Related papers (2025-02-24T04:22:57Z) - Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design [87.58981407469977]
We propose a novel framework for inference-time reward optimization with diffusion models inspired by evolutionary algorithms.<n>Our approach employs an iterative refinement process consisting of two steps in each iteration: noising and reward-guided denoising.
arXiv Detail & Related papers (2025-02-20T17:48:45Z) - Preference Optimization via Contrastive Divergence: Your Reward Model is Secretly an NLL Estimator [32.05337749590184]
We develop a novel PO framework that provides theoretical guidance to effectively sample dispreferred completions.<n>We then select contrastive divergence (CD) as sampling strategy, and propose a novel MC-PO algorithm.<n>OnMC-PO outperforms existing SOTA baselines, and OnMC-PO leads to further improvement.
arXiv Detail & Related papers (2025-02-06T23:45:08Z) - InfAlign: Inference-aware language model alignment [58.66389179049758]
Language model alignment is a critical step in training modern generative language models.<n>We show that this train/test mismatch makes standard RLHF framework sub-optimal in view of inference-time methods.<n>We propose a framework for inference-aware alignment (InfAlign), which aims to optimize inference-time win rate of the aligned policy against the base model.
arXiv Detail & Related papers (2024-12-27T18:45:36Z) - Scaling up Dynamic Edge Partition Models via Stochastic Gradient MCMC [33.08344607564694]
The edge partition model (EPM) is a generative model for extracting an overlapping community structure from static graph-structured data.
Despite having many attractive properties, inference in the EPM is typically performed using Markov chain Monte Carlo (MCMC) methods that prevent it from being applied to massive network data.
arXiv Detail & Related papers (2024-02-29T15:19:35Z) - Sample as You Infer: Predictive Coding With Langevin Dynamics [11.515490109360012]
We present a novel algorithm for parameter learning in generic deep generative models.
Our approach modifies the standard PC algorithm to bring performance on-par and exceeding that obtained from standard variational auto-encoder training.
arXiv Detail & Related papers (2023-11-22T19:36:47Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Plug-and-Play split Gibbs sampler: embedding deep generative priors in
Bayesian inference [12.91637880428221]
This paper introduces a plug-and-play sampling algorithm that leverages variable splitting to efficiently sample from a posterior distribution.
It divides the challenging task of posterior sampling into two simpler sampling problems.
Its performance is compared to recent state-of-the-art optimization and sampling methods.
arXiv Detail & Related papers (2023-04-21T17:17:51Z) - Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for
Inverse Problem [97.64313409741614]
We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators.
We propose to do posterior sampling in the latent space of a pre-trained generative model.
arXiv Detail & Related papers (2022-06-18T03:47:37Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.