Necessary and Sufficient Conditions for Inverse Reinforcement Learning
of Bayesian Stopping Time Problems
- URL: http://arxiv.org/abs/2007.03481v4
- Date: Sun, 20 Feb 2022 14:54:13 GMT
- Title: Necessary and Sufficient Conditions for Inverse Reinforcement Learning
of Bayesian Stopping Time Problems
- Authors: Kunal Pattanayak and Vikram Krishnamurthy
- Abstract summary: This paper presents an inverse reinforcement learning(IRL) framework for Bayesian stopping time problems.
By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function.
Our IRL algorithm identifies optimality and then constructs set valued estimates of the cost function.
- Score: 22.498689292081156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents an inverse reinforcement learning~(IRL) framework for
Bayesian stopping time problems. By observing the actions of a Bayesian
decision maker, we provide a necessary and sufficient condition to identify if
these actions are consistent with optimizing a cost function. In a Bayesian
(partially observed) setting, the inverse learner can at best identify
optimality wrt the observed actions. Our IRL algorithm identifies optimality
and then constructs set valued estimates of the cost function. To achieve this
IRL objective, we use novel ideas from Bayesian revealed preferences stemming
from microeconomics. We illustrate the proposed IRL scheme using two important
examples of stopping time problems, namely, sequential hypothesis testing and
Bayesian search. Finally, for finite datasets, we propose an IRL detection
algorithm and give finite sample bounds on its error probabilities.
Related papers
- Stopping Bayesian Optimization with Probabilistic Regret Bounds [1.4141453107129403]
We investigate replacing the de facto stopping rule with an $(epsilon, delta)$-criterion.
We show how to verify this condition in practice using a limited number of draws from the posterior.
arXiv Detail & Related papers (2024-02-26T18:34:58Z) - Adaptive importance sampling for heavy-tailed distributions via
$\alpha$-divergence minimization [2.879807093604632]
We propose an AIS algorithm that approximates the target by Student-t proposal distributions.
We adapt location and scale parameters by matching the escort moments of the target and the proposal.
These updates minimize the $alpha$-divergence between the target and the proposal, thereby connecting with variational inference.
arXiv Detail & Related papers (2023-10-25T14:07:08Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states.
We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy.
We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z) - Robust Bayesian Recourse [13.526999070658231]
Algorithmic recourse aims to recommend an informative feedback to overturn an unfavorable machine learning decision.
We introduce in this paper the Bayesian recourse, a model-agnostic recourse that minimizes the posterior probability odds ratio.
We present our min-max robust counterpart with the goal of hedging against future changes in the machine learning model parameters.
arXiv Detail & Related papers (2022-06-22T04:17:17Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Bayesian Optimization of Risk Measures [7.799648230758491]
We consider Bayesian optimization of objective functions of the form $rho[ F(x, W) ]$, where $F$ is a black-box expensive-to-evaluate function.
We propose a family of novel Bayesian optimization algorithms that exploit the structure of the objective function to substantially improve sampling efficiency.
arXiv Detail & Related papers (2020-07-10T18:20:46Z) - Excursion Search for Constrained Bayesian Optimization under a Limited
Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures.
Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z) - On Bayesian Search for the Feasible Space Under Computationally
Expensive Constraints [0.0]
We propose a novel acquisition function that combines the probability that a solution lies at the boundary between feasible and infeasible spaces.
Experiments confirmed the efficacy of the proposed function.
arXiv Detail & Related papers (2020-04-23T10:22:32Z) - Time-varying Gaussian Process Bandit Optimization with Non-constant
Evaluation Time [93.6788993843846]
We propose a novel time-varying Bayesian optimization algorithm that can effectively handle the non-constant evaluation time.
Our bound elucidates that a pattern of the evaluation time sequence can hugely affect the difficulty of the problem.
arXiv Detail & Related papers (2020-03-10T13:28:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.