Minimax-Bayes Reinforcement Learning
- URL: http://arxiv.org/abs/2302.10831v1
- Date: Tue, 21 Feb 2023 17:10:21 GMT
- Title: Minimax-Bayes Reinforcement Learning
- Authors: Thomas Kleine Buening, Christos Dimitrakakis, Hannes Eriksson, Divya
Grover, Emilio Jorge
- Abstract summary: This paper studies (sometimes approximate) minimax-Bayes solutions for various reinforcement learning problems.
We find that while the worst-case prior depends on the setting, the corresponding minimax policies are more robust than those that assume a standard (i.e. uniform) prior.
- Score: 2.7456483236562437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the Bayesian decision-theoretic framework offers an elegant solution to
the problem of decision making under uncertainty, one question is how to
appropriately select the prior distribution. One idea is to employ a worst-case
prior. However, this is not as easy to specify in sequential decision making as
in simple statistical estimation problems. This paper studies (sometimes
approximate) minimax-Bayes solutions for various reinforcement learning
problems to gain insights into the properties of the corresponding priors and
policies. We find that while the worst-case prior depends on the setting, the
corresponding minimax policies are more robust than those that assume a
standard (i.e. uniform) prior.
Related papers
- Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Is Offline Decision Making Possible with Only Few Samples? Reliable
Decisions in Data-Starved Bandits via Trust Region Enhancement [25.68354404229254]
We show that even in a data-starved setting it may still be possible to find a policy competitive with the optimal one.
This paves the way to reliable decision-making in settings where critical decisions must be made by relying only on a handful of samples.
arXiv Detail & Related papers (2024-02-24T03:41:09Z) - Learning Deterministic Surrogates for Robust Convex QCQPs [0.0]
We propose a double implicit layer model for training prediction models with respect to robust decision loss.
The first layer solves a deterministic version of the problem, the second layer evaluates the worst case realisation for an uncertainty set.
This enables us to learn model parameterisations that lead to robust decisions while only solving a simpler deterministic problem at test time.
arXiv Detail & Related papers (2023-12-19T16:56:13Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - On the safe use of prior densities for Bayesian model selection [0.0]
We discuss the issue of prior sensitivity of the marginal likelihood and its role in model selection.
We also comment on the use of uninformative priors, which are very common choices in practice.
One of them involving a real-world application on exoplanet detection.
arXiv Detail & Related papers (2022-06-10T16:17:48Z) - Modularity in Reinforcement Learning via Algorithmic Independence in
Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z) - A Mutual Information Maximization Approach for the Spurious Solution
Problem in Weakly Supervised Question Answering [60.768146126094955]
Weakly supervised question answering usually has only the final answers as supervision signals.
There may exist many spurious solutions that coincidentally derive the correct answer, but training on such solutions can hurt model performance.
We propose to explicitly exploit such semantic correlations by maximizing the mutual information between question-answer pairs and predicted solutions.
arXiv Detail & Related papers (2021-06-14T05:47:41Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - A Scalable Two Stage Approach to Computing Optimal Decision Sets [29.946141040012545]
Rule-based models, such as decision trees, decision lists, and decision sets, are conventionally deemed to be the most interpretable.
Recent work uses propositional satisfiability (SAT) solving to generate minimum-size decision sets.
This paper proposes a novel approach to learn minimum-size decision sets by enumerating individual rules of the target decision set independently of each other, and then solving a set cover problem to select a subset of rules.
arXiv Detail & Related papers (2021-02-03T06:51:49Z) - The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data.
We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model.
We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z) - Accelerated Sparse Bayesian Learning via Screening Test and Its
Applications [0.9916217495995309]
For a linear system, to find the sparsest solution provided with an over-complete dictionary of features directly is typically NP-hard.
We propose sparse Bayesian learning, which uses a parameterized prior to encourage sparsity in solution.
arXiv Detail & Related papers (2020-07-08T10:21:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.