Related papers: The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models

The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models

URL: http://arxiv.org/abs/2302.14407v2
Date: Wed, 13 Dec 2023 04:31:56 GMT
Title: The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models
Authors: Jongyeong Lee, Chao-Kai Chiang, Masashi Sugiyama
Abstract summary: Thompson sampling (TS) has been known for its outstanding empirical performance supported by theoretical guarantees across various reward models. This study explores the impact of selecting noninformative priors, offering insights into the performance of TS when dealing with new models that lack theoretical understanding.
Score: 56.31310344616837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Thompson sampling (TS) has been known for its outstanding empirical performance supported by theoretical guarantees across various reward models in the classical stochastic multi-armed bandit problems. Nonetheless, its optimality is often restricted to specific priors due to the common observation that TS is fairly insensitive to the choice of the prior when it comes to asymptotic regret bounds. However, when the model contains multiple parameters, the optimality of TS highly depends on the choice of priors, which casts doubt on the generalizability of previous findings to other models. To address this gap, this study explores the impact of selecting noninformative priors, offering insights into the performance of TS when dealing with new models that lack theoretical understanding. We first extend the regret analysis of TS to the model of uniform distributions with unknown supports, which would be the simplest non-regular model. Our findings reveal that changing noninformative priors can significantly affect the expected regret, aligning with previously known results in other multiparameter bandit models. Although the uniform prior is shown to be optimal, we highlight the inherent limitation of its optimality, which is limited to specific parameterizations and emphasizes the significance of the invariance property of priors. In light of this limitation, we propose a slightly modified TS-based policy, called TS with Truncation (TS-T), which can achieve the asymptotic optimality for the Gaussian models and the uniform models by using the reference prior and the Jeffreys prior that are invariant under one-to-one reparameterizations. This policy provides an alternative approach to achieving optimality by employing fine-tuned truncation, which would be much easier than hunting for optimal priors in practice.

Related papers

Calibrated Multi-Preference Optimization for Aligning Diffusion Models [92.90660301195396]
Calibrated Preference Optimization (CaPO) is a novel method to align text-to-image (T2I) diffusion models. CaPO incorporates the general preference from multiple reward models without human annotated data. Experimental results show that CaPO consistently outperforms prior methods.
arXiv Detail & Related papers (2025-02-04T18:59:23Z)
Continuous Bayesian Model Selection for Multivariate Causal Discovery [22.945274948173182]
Current causal discovery approaches require restrictive model assumptions or assume access to interventional data to ensure structure identifiability. Recent work has shown that Bayesian model selection can greatly improve accuracy by exchanging restrictive modelling for more flexible assumptions. We demonstrate the competitiveness of our approach on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-11-15T12:55:05Z)
Rényi Neural Processes [14.11793373584558]
We propose R'enyi Neural Processes (RNP) to ameliorate the impacts of prior misspecification. We scale the density ratio $fracpq$ by the power of (1-$alpha$) in the divergence gradients with respect to the posterior. Our experiments show consistent log-likelihood improvements over state-of-the-art NP family models.
arXiv Detail & Related papers (2024-05-25T00:14:55Z)
Should We Learn Most Likely Functions or Parameters? [51.133793272222874]
We investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We find that function-space MAP estimation can lead to flatter minima, better generalization, and improved to overfitting.
arXiv Detail & Related papers (2023-11-27T16:39:55Z)
Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters. EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z)
Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits [81.45853204922795]
Thompson sampling has been shown to achieve problem-dependent lower bounds in several reward models. We consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters. We find that TS with the Jeffreys and reference priors can achieve the lower bound if one uses a truncation procedure.
arXiv Detail & Related papers (2023-02-03T04:47:14Z)
On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z)
Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits [17.11922027966447]
This work provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits. For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC.
arXiv Detail & Related papers (2022-11-11T02:23:39Z)
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time [69.7693300927423]
We show that averaging the weights of multiple models fine-tuned with different hyper parameter configurations improves accuracy and robustness. We show that the model soup approach extends to multiple image classification and natural language processing tasks.
arXiv Detail & Related papers (2022-03-10T17:03:49Z)
AdaTerm: Adaptive T-Distribution Estimated Robust Moments for Noise-Robust Stochastic Gradient Optimization [14.531550983885772]
We propose AdaTerm, a novel approach that incorporates the Student's t-distribution to derive not only the first-order moment but also all associated statistics. This provides a unified treatment of the optimization process, offering a comprehensive framework under the statistical model of the t-distribution for the first time.
arXiv Detail & Related papers (2022-01-18T03:13:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.