Related papers: Bayesian decision-making under misspecified priors with applications to meta-learning

Bayesian decision-making under misspecified priors with applications to meta-learning

URL: http://arxiv.org/abs/2107.01509v1
Date: Sat, 3 Jul 2021 23:17:26 GMT
Title: Bayesian decision-making under misspecified priors with applications to meta-learning
Authors: Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dud\'ik, Robert E. Schapire
Abstract summary: Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits. We show that performance degrades gracefully with misspecified priors.
Score: 64.38020203019013
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The choice of prior in these algorithms offers flexibility to encode domain knowledge but can also lead to poor performance when misspecified. In this paper, we demonstrate that performance degrades gracefully with misspecification. We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most $\tilde{\mathcal{O}}(H^2 \epsilon)$ from TS with a well specified prior, where $\epsilon$ is the total-variation distance between priors and $H$ is the learning horizon. Our bound does not require the prior to have any parametric form. For priors with bounded support, our bound is independent of the cardinality or structure of the action space, and we show that it is tight up to universal constants in the worst case. Building on our sensitivity analysis, we establish generic PAC guarantees for algorithms in the recently studied Bayesian meta-learning setting and derive corollaries for various families of priors. Our results generalize along two axes: (1) they apply to a broader family of Bayesian decision-making algorithms, including a Monte-Carlo implementation of the knowledge gradient algorithm (KG), and (2) they apply to Bayesian POMDPs, the most general Bayesian decision-making setting, encompassing contextual bandits as a special case. Through numerical simulations, we illustrate how prior misspecification and the deployment of one-step look-ahead (as in KG) can impact the convergence of meta-learning in multi-armed and contextual bandits with structured and correlated priors.

Related papers

Time-Varying Gaussian Process Bandits with Unknown Prior [18.93478528448966]
PE-GP-UCB is capable of solving time-varying Bayesian optimisation problems. It relies on the fact that either the observed function values are consistent with some of the priors.
arXiv Detail & Related papers (2024-02-02T18:52:16Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Approximate inference of marginals using the IBIA framework [0.0]
Exact inference of marginals in probabilistic graphical models (PGM) is known to be intractable. We propose a new algorithm for marginal inference that is based on the incremental build-infer-approximate (IBIA) paradigm. Our method gives either better or comparable accuracy than existing variational and sampling based methods, with smaller runtimes.
arXiv Detail & Related papers (2023-06-01T04:24:21Z)
Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system. Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations. While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z)
Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem. We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS) We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z)
Metalearning Linear Bandits by Prior Update [7.519872646378836]
Fully Bayesian approaches assume that problem parameters are generated from a known prior, while in practice, such information is often lacking. This problem is exacerbated in decision-making setups with partial information, where using a misspecified prior may lead to poor exploration and inferior performance. In this work we prove, in the context of linear bandits and Gaussian priors, that as long as the prior estimate is sufficiently close to the true prior, the performance of an algorithm that uses the misspecified prior is close to that of an algorithm that uses the true prior.
arXiv Detail & Related papers (2021-07-12T11:17:01Z)
Bayesian imaging using Plug & Play priors: when Langevin meets Tweedie [13.476505672245603]
This paper develops theory, methods, and provably convergent algorithms for performing Bayesian inference with priors. We introduce two algorithms: 1) -ULA (Unadjusted Langevin) Algorithm inference for Monte Carlo sampling and MMSE; and 2) quantitative-SGD (Stochastic Gradient Descent) for inference. The algorithms are demonstrated on several problems such as image denoisering, inpainting, and denoising, where they are used for point estimation as well as for uncertainty visualisation and regularity.
arXiv Detail & Related papers (2021-03-08T12:46:53Z)
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits [56.31950477139053]
We investigate multi-armed bandit with semi-bandit feedback (CMAB) We analyze variants of the Combinatorial Thompson Sampling policy (CTS) This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB)
arXiv Detail & Related papers (2020-06-11T17:12:11Z)
A General Theory of the Stochastic Linear Bandit and Its Applications [8.071506311915398]
We introduce a general analysis framework and a family of algorithms for the linear bandit problem. Our new notion of optimism in expectation gives rise to a new algorithm, called sieved greedy (SG) that reduces the overexploration problem in OFUL. In addition to proving that SG is theoretically rate optimal, our empirical simulations show that SG outperforms existing benchmarks such as greedy, OFUL, and TS.
arXiv Detail & Related papers (2020-02-12T18:54:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.