Fugu-MT 論文翻訳(概要): Reinforced sequential Monte Carlo for amortised sampling

論文の概要: Reinforced sequential Monte Carlo for amortised sampling

arxiv url: http://arxiv.org/abs/2510.11711v1
Date: Mon, 13 Oct 2025 17:59:11 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:30.503471
Title: Reinforced sequential Monte Carlo for amortised sampling
Title（参考訳）: 補充サンプリングのための強化シーケンシャルモンテカルロ
Authors: Sanghyeok Choi, Sarthak Mittal, Víctor Elvira, Jinkyoo Park, Nikolay Malkin,
Abstract要約: 我々は、最大エントロピー強化学習(MaxEnt RL)により訓練されたシーケンシャルモンテカルロ(SMC)とニューラルシーケンシャルサンプリングとの接続を述べる。本稿では,提案関数とツイスト関数の安定な連成訓練手法と,トレーニング信号のばらつきを低減するための適応重み付け方式について述べる。
参考スコア（独自算出の注目度）: 49.92678178064033
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper proposes a synergy of amortised and particle-based methods for sampling from distributions defined by unnormalised density functions. We state a connection between sequential Monte Carlo (SMC) and neural sequential samplers trained by maximum-entropy reinforcement learning (MaxEnt RL), wherein learnt sampling policies and value functions define proposal kernels and twist functions. Exploiting this connection, we introduce an off-policy RL training procedure for the sampler that uses samples from SMC -- using the learnt sampler as a proposal -- as a behaviour policy that better explores the target distribution. We describe techniques for stable joint training of proposals and twist functions and an adaptive weight tempering scheme to reduce training signal variance. Furthermore, building upon past attempts to use experience replay to guide the training of neural samplers, we derive a way to combine historical samples with annealed importance sampling weights within a replay buffer. On synthetic multi-modal targets (in both continuous and discrete spaces) and the Boltzmann distribution of alanine dipeptide conformations, we demonstrate improvements in approximating the true distribution as well as training stability compared to both amortised and Monte Carlo methods.
Abstract（参考訳）: 本稿では,非正規化密度関数で定義される分布から粒子をサンプリングするアモルト化法と粒子法を相乗的に提案する。我々は,最大エントロピー強化学習 (MaxEnt RL) で学習した逐次モンテカルロ (SMC) とニューラルシーケンシャルサンプリングの接続を述べる。このコネクションをエクスプロイトし、ターゲットの分布をよりよく調査する行動ポリシーとして、SMCのサンプル(学習サンプルを提案として使用)を使用するサンプルに対して、非政治的なRLトレーニング手順を導入します。本稿では,提案関数とツイスト関数の安定な連成訓練手法と,トレーニング信号のばらつきを低減するための適応重み付け方式について述べる。さらに,過去の経験的リプレイを用いてニューラルサンプリングのトレーニングをガイドしようとする試みにより,過去のサンプルとアニールによる重要サンプリング重量をリプレイバッファ内で組み合わせる手法が考案された。合成マルチモーダルターゲット(連続空間と離散空間の両方)とアラニンジペプチド配座のボルツマン分布について、アモールト化法およびモンテカルロ法と比較して真の分布の近似およびトレーニング安定性の向上を実証する。

論文の概要: Reinforced sequential Monte Carlo for amortised sampling

関連論文リスト