Related papers: Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits

Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits

URL: http://arxiv.org/abs/2302.01544v1
Date: Fri, 3 Feb 2023 04:47:14 GMT
Title: Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits
Authors: Jongyeong Lee, Junya Honda, Chao-Kai Chiang, Masashi Sugiyama
Abstract summary: Thompson sampling has been shown to achieve problem-dependent lower bounds in several reward models. We consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters. We find that TS with the Jeffreys and reference priors can achieve the lower bound if one uses a truncation procedure.
Score: 81.45853204922795
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the stochastic multi-armed bandit problem, a randomized probability matching policy called Thompson sampling (TS) has shown excellent performance in various reward models. In addition to the empirical performance, TS has been shown to achieve asymptotic problem-dependent lower bounds in several models. However, its optimality has been mainly addressed under light-tailed or one-parameter models that belong to exponential families. In this paper, we consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters. Specifically, we discuss the optimality of TS with probability matching priors that include the Jeffreys prior and the reference priors. We first prove that TS with certain probability matching priors can achieve the optimal regret bound. Then, we show the suboptimality of TS with other priors, including the Jeffreys and the reference priors. Nevertheless, we find that TS with the Jeffreys and reference priors can achieve the asymptotic lower bound if one uses a truncation procedure. These results suggest carefully choosing noninformative priors to avoid suboptimality and show the effectiveness of truncation procedures in TS-based policies.

Related papers

Stochastically Constrained Best Arm Identification with Thompson Sampling [11.728338956484091]
We consider the problem of the best arm identification in the presence of constraints, where there is a finite number of arms associated with multiple performance measures. We will explore the popular idea of Thompson sampling (TS) as a means to solve it. We will design a TS-based sampling algorithm, establish its optimality in the rate of posterior convergence, and demonstrate its superior performance using numerical examples.
arXiv Detail & Related papers (2025-01-07T15:40:22Z)
Rényi Neural Processes [14.11793373584558]
We propose R'enyi Neural Processes (RNP) to ameliorate the impacts of prior misspecification. We scale the density ratio $fracpq$ by the power of (1-$alpha$) in the divergence gradients with respect to the posterior. Our experiments show consistent log-likelihood improvements over state-of-the-art NP family models.
arXiv Detail & Related papers (2024-05-25T00:14:55Z)
Should We Learn Most Likely Functions or Parameters? [51.133793272222874]
We investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We find that function-space MAP estimation can lead to flatter minima, better generalization, and improved to overfitting.
arXiv Detail & Related papers (2023-11-27T16:39:55Z)
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models [56.31310344616837]
Thompson sampling (TS) has been known for its outstanding empirical performance supported by theoretical guarantees across various reward models. This study explores the impact of selecting noninformative priors, offering insights into the performance of TS when dealing with new models that lack theoretical understanding.
arXiv Detail & Related papers (2023-02-28T08:42:42Z)
Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits [17.11922027966447]
This work provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits. For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC.
arXiv Detail & Related papers (2022-11-11T02:23:39Z)
Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy. To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN. Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z)
Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures. We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z)
Bayesian Joint Chance Constrained Optimization: Approximations and Statistical Consistency [10.20554144865699]
We focus on the question of statistical consistency of the optimal value, computed using an approximate posterior distribution. We also prove the feasibility of the approximate optimization problem. We also demonstrate the utility of our approach on an optimal staffing problem for an M/M/c queueing model.
arXiv Detail & Related papers (2021-06-23T07:11:39Z)
Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds and Comparison with KL-divergence based Optimal Posteriors and Cross-Validation Procedure [0.0]
We investigate optimal posteriors for chi-squared divergence based PACBayesian bounds in terms of their distribution, scalability of computations, and test set performance. Chi-squared divergence based posteriors have weaker bounds and worse test errors, hinting at an underlying regularization by KL-divergence based posteriors.
arXiv Detail & Related papers (2020-08-14T03:15:23Z)
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits [56.31950477139053]
We investigate multi-armed bandit with semi-bandit feedback (CMAB) We analyze variants of the Combinatorial Thompson Sampling policy (CTS) This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB)
arXiv Detail & Related papers (2020-06-11T17:12:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.