The Choice of Noninformative Priors for Thompson Sampling in
Multiparameter Bandit Models
- URL: http://arxiv.org/abs/2302.14407v2
- Date: Wed, 13 Dec 2023 04:31:56 GMT
- Title: The Choice of Noninformative Priors for Thompson Sampling in
Multiparameter Bandit Models
- Authors: Jongyeong Lee, Chao-Kai Chiang, Masashi Sugiyama
- Abstract summary: Thompson sampling (TS) has been known for its outstanding empirical performance supported by theoretical guarantees across various reward models.
This study explores the impact of selecting noninformative priors, offering insights into the performance of TS when dealing with new models that lack theoretical understanding.
- Score: 56.31310344616837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thompson sampling (TS) has been known for its outstanding empirical
performance supported by theoretical guarantees across various reward models in
the classical stochastic multi-armed bandit problems. Nonetheless, its
optimality is often restricted to specific priors due to the common observation
that TS is fairly insensitive to the choice of the prior when it comes to
asymptotic regret bounds. However, when the model contains multiple parameters,
the optimality of TS highly depends on the choice of priors, which casts doubt
on the generalizability of previous findings to other models. To address this
gap, this study explores the impact of selecting noninformative priors,
offering insights into the performance of TS when dealing with new models that
lack theoretical understanding. We first extend the regret analysis of TS to
the model of uniform distributions with unknown supports, which would be the
simplest non-regular model. Our findings reveal that changing noninformative
priors can significantly affect the expected regret, aligning with previously
known results in other multiparameter bandit models. Although the uniform prior
is shown to be optimal, we highlight the inherent limitation of its optimality,
which is limited to specific parameterizations and emphasizes the significance
of the invariance property of priors. In light of this limitation, we propose a
slightly modified TS-based policy, called TS with Truncation (TS-T), which can
achieve the asymptotic optimality for the Gaussian models and the uniform
models by using the reference prior and the Jeffreys prior that are invariant
under one-to-one reparameterizations. This policy provides an alternative
approach to achieving optimality by employing fine-tuned truncation, which
would be much easier than hunting for optimal priors in practice.
Related papers
- Continuous Bayesian Model Selection for Multivariate Causal Discovery [22.945274948173182]
Current causal discovery approaches require restrictive model assumptions or assume access to interventional data to ensure structure identifiability.
Recent work has shown that Bayesian model selection can greatly improve accuracy by exchanging restrictive modelling for more flexible assumptions.
We demonstrate the competitiveness of our approach on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-11-15T12:55:05Z) - Rényi Neural Processes [14.11793373584558]
We propose R'enyi Neural Processes (RNP) to ameliorate the impacts of prior misspecification.
We scale the density ratio $fracpq$ by the power of (1-$alpha$) in the divergence gradients with respect to the posterior.
Our experiments show consistent log-likelihood improvements over state-of-the-art NP family models.
arXiv Detail & Related papers (2024-05-25T00:14:55Z) - Should We Learn Most Likely Functions or Parameters? [51.133793272222874]
We investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data.
We find that function-space MAP estimation can lead to flatter minima, better generalization, and improved to overfitting.
arXiv Detail & Related papers (2023-11-27T16:39:55Z) - Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters.
EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z) - Optimality of Thompson Sampling with Noninformative Priors for Pareto
Bandits [81.45853204922795]
Thompson sampling has been shown to achieve problem-dependent lower bounds in several reward models.
We consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters.
We find that TS with the Jeffreys and reference priors can achieve the lower bound if one uses a truncation procedure.
arXiv Detail & Related papers (2023-02-03T04:47:14Z) - On the Effectiveness of Parameter-Efficient Fine-Tuning [79.6302606855302]
Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks.
We show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them.
Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters.
arXiv Detail & Related papers (2022-11-28T17:41:48Z) - Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits [17.11922027966447]
This work provides theoretical guarantees of Thompson sampling in high dimensional and sparse contextual bandits.
For faster computation, we use spike-and-slab prior to model the unknown parameter and variational inference instead of MCMC.
arXiv Detail & Related papers (2022-11-11T02:23:39Z) - Model soups: averaging weights of multiple fine-tuned models improves
accuracy without increasing inference time [69.7693300927423]
We show that averaging the weights of multiple models fine-tuned with different hyper parameter configurations improves accuracy and robustness.
We show that the model soup approach extends to multiple image classification and natural language processing tasks.
arXiv Detail & Related papers (2022-03-10T17:03:49Z) - AdaTerm: Adaptive T-Distribution Estimated Robust Moments for
Noise-Robust Stochastic Gradient Optimization [14.531550983885772]
We propose AdaTerm, a novel approach that incorporates the Student's t-distribution to derive not only the first-order moment but also all associated statistics.
This provides a unified treatment of the optimization process, offering a comprehensive framework under the statistical model of the t-distribution for the first time.
arXiv Detail & Related papers (2022-01-18T03:13:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.