Reinforced sequential Monte Carlo for amortised sampling
- URL: http://arxiv.org/abs/2510.11711v1
- Date: Mon, 13 Oct 2025 17:59:11 GMT
- Title: Reinforced sequential Monte Carlo for amortised sampling
- Authors: Sanghyeok Choi, Sarthak Mittal, VĂctor Elvira, Jinkyoo Park, Nikolay Malkin,
- Abstract summary: We state a connection between sequential Monte Carlo (SMC) and neural sequential samplers trained by maximum-entropy reinforcement learning (MaxEnt RL)<n>We describe techniques for stable joint training of proposals and twist functions and an adaptive weight tempering scheme to reduce training signal variance.
- Score: 49.92678178064033
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a synergy of amortised and particle-based methods for sampling from distributions defined by unnormalised density functions. We state a connection between sequential Monte Carlo (SMC) and neural sequential samplers trained by maximum-entropy reinforcement learning (MaxEnt RL), wherein learnt sampling policies and value functions define proposal kernels and twist functions. Exploiting this connection, we introduce an off-policy RL training procedure for the sampler that uses samples from SMC -- using the learnt sampler as a proposal -- as a behaviour policy that better explores the target distribution. We describe techniques for stable joint training of proposals and twist functions and an adaptive weight tempering scheme to reduce training signal variance. Furthermore, building upon past attempts to use experience replay to guide the training of neural samplers, we derive a way to combine historical samples with annealed importance sampling weights within a replay buffer. On synthetic multi-modal targets (in both continuous and discrete spaces) and the Boltzmann distribution of alanine dipeptide conformations, we demonstrate improvements in approximating the true distribution as well as training stability compared to both amortised and Monte Carlo methods.
Related papers
- TFTF: Training-Free Targeted Flow for Conditional Sampling [1.4151684142137693]
We propose a training-free conditional sampling method for flow matching models based on importance sampling.<n>Because a nave application of importance sampling suffers from weighteneracy in high-dimensional settings, we modify and incorporate a resampling technique in sequential Monte Carlo.<n>Our framework requires no additional training, while providing theoretical guarantees of accuracy.
arXiv Detail & Related papers (2026-02-13T13:41:35Z) - Learnable Chernoff Baselines for Inference-Time Alignment [64.81256817158851]
We introduce Learnable Chernoff Baselines as a method for efficiently and approximately sampling from exponentially tilted kernels.<n>We establish total-variation guarantees to the ideal aligned model, and demonstrate in both continuous and discrete diffusion settings that LCB sampling closely matches ideal rejection sampling.
arXiv Detail & Related papers (2026-02-08T00:09:40Z) - Amortized Sampling with Transferable Normalizing Flows [65.48838168417564]
Prose is a transferable normalizing flow trained on a corpus of peptide molecular dynamics trajectories up to 8 residues in length.<n>We show that Prose is a proposal for a variety of sampling algorithms, finding a simple importance sampling-based finetuning procedure to achieve superior performance.<n>We open-source the Prose dataset to further stimulate research into amortized sampling methods and finetuning objectives.
arXiv Detail & Related papers (2025-08-25T16:28:18Z) - Non-equilibrium Annealed Adjoint Sampler [27.73022309947818]
We introduce the textbfNon-equilibrium Annealed Adjoint Sampler (NAAS), a novel SOC-based diffusion sampler.<n> NAAS employs a lean adjoint system inspired by adjoint matching, enabling efficient and scalable training.
arXiv Detail & Related papers (2025-06-22T20:41:31Z) - Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts [64.34482582690927]
We provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models.<n>We propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality.
arXiv Detail & Related papers (2025-03-04T17:46:51Z) - Generalized Bayesian deep reinforcement learning [2.469908534801392]
We propose to model the dynamics of the unknown environment through deep generative models, assuming Markov dependence.<n>In the absence of likelihood functions for these models, we train them by learning a generalized predictive-sequential (or prequential) scoring rule (SR) posterior.<n>For policy learning, we propose expected Thompson sampling (ETS) to learn the optimal policy by maximising the expected value function with respect to the posterior distribution.
arXiv Detail & Related papers (2024-12-16T13:02:17Z) - Adaptive teachers for amortized samplers [76.88721198565861]
We propose an adaptive training distribution (the teacher) to guide the training of the primary amortized sampler (the student)<n>We validate the effectiveness of this approach in a synthetic environment designed to present an exploration challenge.
arXiv Detail & Related papers (2024-10-02T11:33:13Z) - Stochastic Localization via Iterative Posterior Sampling [2.1383136715042417]
We consider a general localization framework and introduce an explicit class of observation processes, associated with flexible denoising schedules.
We provide a complete methodology, $textitStochastic localization via Iterative Posterior Sampling$ (SLIPS), to obtain approximate samples of this dynamics, and as a byproduct, samples from the target distribution.
We illustrate the benefits and applicability of SLIPS on several benchmarks of multi-modal distributions, including mixtures in increasing dimensions, logistic regression and high-dimensional field system from statistical-mechanics.
arXiv Detail & Related papers (2024-02-16T15:28:41Z) - Improved off-policy training of diffusion samplers [93.66433483772055]
We study the problem of training diffusion models to sample from a distribution with an unnormalized density or energy function.<n>We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods.<n>Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work.
arXiv Detail & Related papers (2024-02-07T18:51:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.