Optimality of Thompson Sampling with Noninformative Priors for Pareto
Bandits
- URL: http://arxiv.org/abs/2302.01544v1
- Date: Fri, 3 Feb 2023 04:47:14 GMT
- Title: Optimality of Thompson Sampling with Noninformative Priors for Pareto
Bandits
- Authors: Jongyeong Lee, Junya Honda, Chao-Kai Chiang, Masashi Sugiyama
- Abstract summary: Thompson sampling has been shown to achieve problem-dependent lower bounds in several reward models.
We consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters.
We find that TS with the Jeffreys and reference priors can achieve the lower bound if one uses a truncation procedure.
- Score: 81.45853204922795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the stochastic multi-armed bandit problem, a randomized probability
matching policy called Thompson sampling (TS) has shown excellent performance
in various reward models. In addition to the empirical performance, TS has been
shown to achieve asymptotic problem-dependent lower bounds in several models.
However, its optimality has been mainly addressed under light-tailed or
one-parameter models that belong to exponential families. In this paper, we
consider the optimality of TS for the Pareto model that has a heavy tail and is
parameterized by two unknown parameters. Specifically, we discuss the
optimality of TS with probability matching priors that include the Jeffreys
prior and the reference priors. We first prove that TS with certain probability
matching priors can achieve the optimal regret bound. Then, we show the
suboptimality of TS with other priors, including the Jeffreys and the reference
priors. Nevertheless, we find that TS with the Jeffreys and reference priors
can achieve the asymptotic lower bound if one uses a truncation procedure.
These results suggest carefully choosing noninformative priors to avoid
suboptimality and show the effectiveness of truncation procedures in TS-based
policies.
Related papers
- Rényi Neural Processes [14.11793373584558]
We propose R'enyi Neural Processes (RNP) to ameliorate the impacts of prior misspecification.
We scale the density ratio $fracpq$ by the power of (1-$alpha$) in the divergence gradients with respect to the posterior.
Our experiments show consistent log-likelihood improvements over state-of-the-art NP family models.
arXiv Detail & Related papers (2024-05-25T00:14:55Z) - Should We Learn Most Likely Functions or Parameters? [51.133793272222874]
We investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data.
We find that function-space MAP estimation can lead to flatter minima, better generalization, and improved to overfitting.
arXiv Detail & Related papers (2023-11-27T16:39:55Z) - The Choice of Noninformative Priors for Thompson Sampling in
Multiparameter Bandit Models [56.31310344616837]
Thompson sampling (TS) has been known for its outstanding empirical performance supported by theoretical guarantees across various reward models.
This study explores the impact of selecting noninformative priors, offering insights into the performance of TS when dealing with new models that lack theoretical understanding.
arXiv Detail & Related papers (2023-02-28T08:42:42Z) - Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits [17.11922027966447]
This work provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits.
For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC.
arXiv Detail & Related papers (2022-11-11T02:23:39Z) - Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy.
To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN.
Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Bayesian Joint Chance Constrained Optimization: Approximations and
Statistical Consistency [10.20554144865699]
We focus on the question of statistical consistency of the optimal value, computed using an approximate posterior distribution.
We also prove the feasibility of the approximate optimization problem.
We also demonstrate the utility of our approach on an optimal staffing problem for an M/M/c queueing model.
arXiv Detail & Related papers (2021-06-23T07:11:39Z) - Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds
and Comparison with KL-divergence based Optimal Posteriors and
Cross-Validation Procedure [0.0]
We investigate optimal posteriors for chi-squared divergence based PACBayesian bounds in terms of their distribution, scalability of computations, and test set performance.
Chi-squared divergence based posteriors have weaker bounds and worse test errors, hinting at an underlying regularization by KL-divergence based posteriors.
arXiv Detail & Related papers (2020-08-14T03:15:23Z) - Statistical Efficiency of Thompson Sampling for Combinatorial
Semi-Bandits [56.31950477139053]
We investigate multi-armed bandit with semi-bandit feedback (CMAB)
We analyze variants of the Combinatorial Thompson Sampling policy (CTS)
This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB)
arXiv Detail & Related papers (2020-06-11T17:12:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.