Bayesian decision-making under misspecified priors with applications to
meta-learning
- URL: http://arxiv.org/abs/2107.01509v1
- Date: Sat, 3 Jul 2021 23:17:26 GMT
- Title: Bayesian decision-making under misspecified priors with applications to
meta-learning
- Authors: Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu,
Thodoris Lykouris, Miroslav Dud\'ik, Robert E. Schapire
- Abstract summary: Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
- Score: 64.38020203019013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thompson sampling and other Bayesian sequential decision-making algorithms
are among the most popular approaches to tackle explore/exploit trade-offs in
(contextual) bandits. The choice of prior in these algorithms offers
flexibility to encode domain knowledge but can also lead to poor performance
when misspecified. In this paper, we demonstrate that performance degrades
gracefully with misspecification. We prove that the expected reward accrued by
Thompson sampling (TS) with a misspecified prior differs by at most
$\tilde{\mathcal{O}}(H^2 \epsilon)$ from TS with a well specified prior, where
$\epsilon$ is the total-variation distance between priors and $H$ is the
learning horizon. Our bound does not require the prior to have any parametric
form. For priors with bounded support, our bound is independent of the
cardinality or structure of the action space, and we show that it is tight up
to universal constants in the worst case.
Building on our sensitivity analysis, we establish generic PAC guarantees for
algorithms in the recently studied Bayesian meta-learning setting and derive
corollaries for various families of priors. Our results generalize along two
axes: (1) they apply to a broader family of Bayesian decision-making
algorithms, including a Monte-Carlo implementation of the knowledge gradient
algorithm (KG), and (2) they apply to Bayesian POMDPs, the most general
Bayesian decision-making setting, encompassing contextual bandits as a special
case. Through numerical simulations, we illustrate how prior misspecification
and the deployment of one-step look-ahead (as in KG) can impact the convergence
of meta-learning in multi-armed and contextual bandits with structured and
correlated priors.
Related papers
- Time-Varying Gaussian Process Bandits with Unknown Prior [18.93478528448966]
PE-GP-UCB is capable of solving time-varying Bayesian optimisation problems.
It relies on the fact that either the observed function values are consistent with some of the priors.
arXiv Detail & Related papers (2024-02-02T18:52:16Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Approximate inference of marginals using the IBIA framework [0.0]
Exact inference of marginals in probabilistic graphical models (PGM) is known to be intractable.
We propose a new algorithm for marginal inference that is based on the incremental build-infer-approximate (IBIA) paradigm.
Our method gives either better or comparable accuracy than existing variational and sampling based methods, with smaller runtimes.
arXiv Detail & Related papers (2023-06-01T04:24:21Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem.
We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS)
We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z) - Metalearning Linear Bandits by Prior Update [7.519872646378836]
Fully Bayesian approaches assume that problem parameters are generated from a known prior, while in practice, such information is often lacking.
This problem is exacerbated in decision-making setups with partial information, where using a misspecified prior may lead to poor exploration and inferior performance.
In this work we prove, in the context of linear bandits and Gaussian priors, that as long as the prior estimate is sufficiently close to the true prior, the performance of an algorithm that uses the misspecified prior is close to that of an algorithm that uses the true prior.
arXiv Detail & Related papers (2021-07-12T11:17:01Z) - Bayesian imaging using Plug & Play priors: when Langevin meets Tweedie [13.476505672245603]
This paper develops theory, methods, and provably convergent algorithms for performing Bayesian inference with priors.
We introduce two algorithms: 1) -ULA (Unadjusted Langevin) Algorithm inference for Monte Carlo sampling and MMSE; and 2) quantitative-SGD (Stochastic Gradient Descent) for inference.
The algorithms are demonstrated on several problems such as image denoisering, inpainting, and denoising, where they are used for point estimation as well as for uncertainty visualisation and regularity.
arXiv Detail & Related papers (2021-03-08T12:46:53Z) - Statistical Efficiency of Thompson Sampling for Combinatorial
Semi-Bandits [56.31950477139053]
We investigate multi-armed bandit with semi-bandit feedback (CMAB)
We analyze variants of the Combinatorial Thompson Sampling policy (CTS)
This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB)
arXiv Detail & Related papers (2020-06-11T17:12:11Z) - A General Theory of the Stochastic Linear Bandit and Its Applications [8.071506311915398]
We introduce a general analysis framework and a family of algorithms for the linear bandit problem.
Our new notion of optimism in expectation gives rise to a new algorithm, called sieved greedy (SG) that reduces the overexploration problem in OFUL.
In addition to proving that SG is theoretically rate optimal, our empirical simulations show that SG outperforms existing benchmarks such as greedy, OFUL, and TS.
arXiv Detail & Related papers (2020-02-12T18:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.