Related papers: Minimax-Bayes Reinforcement Learning

Related papers

A Principled Approach to Randomized Selection under Uncertainty: Applications to Peer Review and Grant Funding [68.43987626137512]
We propose a principled framework for randomized decision-making based on interval estimates of the quality of each item.<n>We introduce MERIT, an optimization-based method that maximizes the worst-case expected number of top candidates selected.<n>We prove that MERIT satisfies desirable axiomatic properties not guaranteed by existing approaches.
arXiv Detail & Related papers (2025-06-23T19:59:30Z)
Sufficient Decision Proxies for Decision-Focused Learning [2.7143637678944454]
Decision-focused learning aims at learning a predictive model such that decision quality, instead of prediction accuracy, is maximized.<n>This paper investigates for the first time problem properties that justify using either assumption.<n>We show the effectiveness of presented approaches in experiments on problems with continuous and discrete variables, as well as uncertainty in the objective function and in the constraints.
arXiv Detail & Related papers (2025-05-06T20:10:17Z)
Statistical Analysis of Policy Space Compression Problem [54.1754937830779]
Policy search methods are crucial in reinforcement learning, offering a framework to address continuous state-action and partially observable problems. Reducing the policy space through policy compression emerges as a powerful, reward-free approach to accelerate the learning process. This technique condenses the policy space into a smaller, representative set while maintaining most of the original effectiveness.
arXiv Detail & Related papers (2024-11-15T02:46:55Z)
Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. We propose a single framework built on their equivalence in learning scenarios. Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z)
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement [25.68354404229254]
We show that even in a data-starved setting it may still be possible to find a policy competitive with the optimal one. This paves the way to reliable decision-making in settings where critical decisions must be made by relying only on a handful of samples.
arXiv Detail & Related papers (2024-02-24T03:41:09Z)
Learning Deterministic Surrogates for Robust Convex QCQPs [0.0]
We propose a double implicit layer model for training prediction models with respect to robust decision loss. The first layer solves a deterministic version of the problem, the second layer evaluates the worst case realisation for an uncertainty set. This enables us to learn model parameterisations that lead to robust decisions while only solving a simpler deterministic problem at test time.
arXiv Detail & Related papers (2023-12-19T16:56:13Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Online POMDP Planning with Anytime Deterministic Optimality Guarantees [9.444784653236157]
We derive a deterministic relationship for discrete POMDPs between an approximated and the optimal solution. We show that our derivations provide an avenue for a new set of algorithms and can be attached to existing algorithms.
arXiv Detail & Related papers (2023-10-03T04:40:38Z)
On the safe use of prior densities for Bayesian model selection [0.0]
We discuss the issue of prior sensitivity of the marginal likelihood and its role in model selection. We also comment on the use of uninformative priors, which are very common choices in practice. One of them involving a real-world application on exoplanet detection.
arXiv Detail & Related papers (2022-06-10T16:17:48Z)
Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions. We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z)
A Mutual Information Maximization Approach for the Spurious Solution Problem in Weakly Supervised Question Answering [60.768146126094955]
Weakly supervised question answering usually has only the final answers as supervision signals. There may exist many spurious solutions that coincidentally derive the correct answer, but training on such solutions can hurt model performance. We propose to explicitly exploit such semantic correlations by maximizing the mutual information between question-answer pairs and predicted solutions.
arXiv Detail & Related papers (2021-06-14T05:47:41Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
A Scalable Two Stage Approach to Computing Optimal Decision Sets [29.946141040012545]
Rule-based models, such as decision trees, decision lists, and decision sets, are conventionally deemed to be the most interpretable. Recent work uses propositional satisfiability (SAT) solving to generate minimum-size decision sets. This paper proposes a novel approach to learn minimum-size decision sets by enumerating individual rules of the target decision set independently of each other, and then solving a set cover problem to select a subset of rules.
arXiv Detail & Related papers (2021-02-03T06:51:49Z)
Accelerated Sparse Bayesian Learning via Screening Test and Its Applications [0.9916217495995309]
For a linear system, to find the sparsest solution provided with an over-complete dictionary of features directly is typically NP-hard. We propose sparse Bayesian learning, which uses a parameterized prior to encourage sparsity in solution.
arXiv Detail & Related papers (2020-07-08T10:21:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.