Policy-Based Bayesian Experimental Design for Non-Differentiable
Implicit Models
- URL: http://arxiv.org/abs/2203.04272v1
- Date: Tue, 8 Mar 2022 18:47:01 GMT
- Title: Policy-Based Bayesian Experimental Design for Non-Differentiable
Implicit Models
- Authors: Vincent Lim, Ellen Novoseller, Jeffrey Ichnowski, Huang Huang, Ken
Goldberg
- Abstract summary: Reinforcement Learning for Deep Adaptive Design (RL-DAD) is a method for simulation-based optimal experimental design for non-differentiable implicit models.
RL-DAD maps prior histories to experiment designs offline and can be quickly deployed during online execution.
- Score: 25.00242490764664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For applications in healthcare, physics, energy, robotics, and many other
fields, designing maximally informative experiments is valuable, particularly
when experiments are expensive, time-consuming, or pose safety hazards. While
existing approaches can sequentially design experiments based on prior
observation history, many of these methods do not extend to implicit models,
where simulation is possible but computing the likelihood is intractable.
Furthermore, they often require either significant online computation during
deployment or a differentiable simulation system. We introduce Reinforcement
Learning for Deep Adaptive Design (RL-DAD), a method for simulation-based
optimal experimental design for non-differentiable implicit models. RL-DAD
extends prior work in policy-based Bayesian Optimal Experimental Design (BOED)
by reformulating it as a Markov Decision Process with a reward function based
on likelihood-free information lower bounds, which is used to learn a policy
via deep reinforcement learning. The learned design policy maps prior histories
to experiment designs offline and can be quickly deployed during online
execution. We evaluate RL-DAD and find that it performs competitively with
baselines on three benchmarks.
Related papers
- Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization [55.14484317645865]
We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks.
We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
arXiv Detail & Related papers (2024-11-02T07:38:02Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Online simulator-based experimental design for cognitive model selection [74.76661199843284]
We propose BOSMOS: an approach to experimental design that can select between computational models without tractable likelihoods.
In simulated experiments, we demonstrate that the proposed BOSMOS technique can accurately select models in up to 2 orders of magnitude less time than existing LFI alternatives.
arXiv Detail & Related papers (2023-03-03T21:41:01Z) - Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems.
However, RL approaches are intractable in the slate recommendation scenario.
In that setting, an action corresponds to a slate that may contain any combination of items.
In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder.
We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z) - Towards Data-Driven Offline Simulations for Online Reinforcement
Learning [30.654163861164864]
We formalize offline learner simulation (OLS) for reinforcement learning (RL)
We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
arXiv Detail & Related papers (2022-11-14T18:36:13Z) - Optimizing Sequential Experimental Design with Deep Reinforcement
Learning [7.589363597086081]
We show that the problem of optimizing policies can be reduced to solving a Markov decision process (MDP)
Our approach is also computationally efficient at deployment time and exhibits state-of-the-art performance on both continuous and discrete design spaces.
arXiv Detail & Related papers (2022-02-02T00:23:05Z) - Reinforcement Learning based Sequential Batch-sampling for Bayesian
Optimal Experimental Design [1.6249267147413522]
Sequential design of experiments (SDOE) is a popular suite of methods, that has yielded promising results in recent years.
In this work, we aim to extend the SDOE strategy, to query the experiment or computer code at a batch of inputs.
A unique capability of the proposed methodology is its ability to be applied to multiple tasks, for example optimization of a function, once its trained.
arXiv Detail & Related papers (2021-12-21T02:25:23Z) - Implicit Deep Adaptive Design: Policy-Based Experimental Design without
Likelihoods [24.50829695870901]
implicit Deep Adaptive Design (iDAD) is a new method for performing adaptive experiments in real-time with implicit models.
iDAD amortizes the cost of Bayesian optimal experimental design (BOED) by learning a design policy network upfront.
arXiv Detail & Related papers (2021-11-03T16:24:05Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.