Sequential Bayesian Optimal Experimental Design in Infinite Dimensions via Policy Gradient Reinforcement Learning
- URL: http://arxiv.org/abs/2601.05868v1
- Date: Fri, 09 Jan 2026 15:44:49 GMT
- Title: Sequential Bayesian Optimal Experimental Design in Infinite Dimensions via Policy Gradient Reinforcement Learning
- Authors: Kaichen Shen, Peng Chen,
- Abstract summary: High-fidelity approaches require repeated forward and adjoint PDE solves inside nested Bayesian inversion and design loops.<n>We formulate SBOED as a finite-horizon Markov decision process and learn an amortized design policy via policy-gradient reinforcement learning.<n> Numerical experiments on sequential multi-sensor placement for contaminant source tracking demonstrate approximately $100times$ speedup over high-fidelity finite element methods.
- Score: 3.2580743227673694
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequential Bayesian optimal experimental design (SBOED) for PDE-governed inverse problems is computationally challenging, especially for infinite-dimensional random field parameters. High-fidelity approaches require repeated forward and adjoint PDE solves inside nested Bayesian inversion and design loops. We formulate SBOED as a finite-horizon Markov decision process and learn an amortized design policy via policy-gradient reinforcement learning (PGRL), enabling online design selection from the experiment history without repeatedly solving an SBOED optimization problem. To make policy training and reward evaluation scalable, we combine dual dimension reduction -- active subspace projection for the parameter and principal component analysis for the state -- with an adjusted derivative-informed latent attention neural operator (LANO) surrogate that predicts both the parameter-to-solution map and its Jacobian. We use a Laplace-based D-optimality reward while noting that, in general, other expected-information-gain utilities such as KL divergence can also be used within the same framework. We further introduce an eigenvalue-based evaluation strategy that uses prior samples as proxies for maximum a posteriori (MAP) points, avoiding repeated MAP solves while retaining accurate information-gain estimates. Numerical experiments on sequential multi-sensor placement for contaminant source tracking demonstrate approximately $100\times$ speedup over high-fidelity finite element methods, improved performance over random sensor placements, and physically interpretable policies that discover an ``upstream'' tracking strategy.
Related papers
- Gradient-Free Sequential Bayesian Experimental Design via Interacting Particle Systems [1.1549572298362782]
We introduce a gradient-free framework for Bayesian Optimal Experimental Design (BOED) in sequential settings.<n>Our method combines Ensemble Kalman Inversion (EKI) for design optimization with the Affine-Invariant Langevin Dynamics (ALDI) sampler for efficient posterior sampling.<n> variational Gaussian and parametrized Laplace approximations provide tractable upper and lower bounds on the Expected Information Gain.
arXiv Detail & Related papers (2025-04-17T20:16:15Z) - Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - An Adaptive Dimension Reduction Estimation Method for High-dimensional
Bayesian Optimization [6.79843988450982]
We propose a two-step optimization framework to extend BO to high-dimensional settings.
Our algorithm offers the flexibility to operate these steps either concurrently or in sequence.
Numerical experiments validate the efficacy of our method in challenging scenarios.
arXiv Detail & Related papers (2024-03-08T16:21:08Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z) - Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant
Fuel Optimization [0.0]
This work presents a first-of-a-kind approach to utilize deep RL to solve the loading pattern problem and could be leveraged for any engineering design optimization.
arXiv Detail & Related papers (2023-05-09T23:51:24Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Bayesian Sequential Optimal Experimental Design for Nonlinear Models
Using Policy Gradient Reinforcement Learning [0.0]
We formulate this sequential optimal experimental design (sOED) problem as a finite-horizon partially observable Markov decision process (POMDP)
It is built to accommodate continuous random variables, general non-Gaussian posteriors, and expensive nonlinear forward models.
We solve for the sOED policy numerically via policy gradient (PG) methods from reinforcement learning, and derive and prove the PG expression for sOED.
The overall PG-sOED method is validated on a linear-Gaussian benchmark, and its advantages over batch and greedy designs are demonstrated through a contaminant source inversion problem in a
arXiv Detail & Related papers (2021-10-28T17:47:31Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Optimal Bayesian experimental design for subsurface flow problems [77.34726150561087]
We propose a novel approach for development of chaos expansion (PCE) surrogate model for the design utility function.
This novel technique enables the derivation of a reasonable quality response surface for the targeted objective function with a computational budget comparable to several single-point evaluations.
arXiv Detail & Related papers (2020-08-10T09:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.