Almost Free: Self-concordance in Natural Exponential Families and an Application to Bandits
- URL: http://arxiv.org/abs/2410.01112v1
- Date: Tue, 1 Oct 2024 22:42:19 GMT
- Title: Almost Free: Self-concordance in Natural Exponential Families and an Application to Bandits
- Authors: Shuai Liu, Alex Ayoub, Flore Sentenac, Xiaoqi Tan, Csaba Szepesvári,
- Abstract summary: We show that optimistic algorithms for generalized linear bandits enjoy regret bounds that are both second-order and free of an exponential dependence on the bound of the problem parameter in the leading term.
Ours is the first regret bound for generalized linear bandits with subexponential tails, broadening the class of problems to include Poisson, exponential and gamma bandits.
- Score: 32.10916558109406
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We prove that single-parameter natural exponential families with subexponential tails are self-concordant with polynomial-sized parameters. For subgaussian natural exponential families we establish an exact characterization of the growth rate of the self-concordance parameter. Applying these findings to bandits allows us to fill gaps in the literature: We show that optimistic algorithms for generalized linear bandits enjoy regret bounds that are both second-order (scale with the variance of the optimal arm's reward distribution) and free of an exponential dependence on the bound of the problem parameter in the leading term. To the best of our knowledge, ours is the first regret bound for generalized linear bandits with subexponential tails, broadening the class of problems to include Poisson, exponential and gamma bandits.
Related papers
- Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Source Condition Double Robust Inference on Functionals of Inverse
Problems [71.42652863687117]
We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems.
We provide the first source condition double robust inference method.
arXiv Detail & Related papers (2023-07-25T19:54:46Z) - Clustering above Exponential Families with Tempered Exponential Measures [28.532545355403123]
Link with exponential families has allowed $k$-means clustering to be generalized to a wide variety of data generating distributions.
Getting the framework to work above exponential families is important to lift roadblocks like the lack of robustness of some population minimizers carved in their axiomatization.
arXiv Detail & Related papers (2022-11-04T21:58:40Z) - Reproducible Bandits [95.8830340560603]
A policy in the bandit environment is called reproducible if it pulls, with high probability, the exact same sequence of arms in two different executions.
We show that not only do reproducible policies exist, but also they achieve almost the same optimal (non-reproducible) regret bounds in terms of the time horizon.
Our results show that even though randomization is crucial for the exploration-exploitation trade-off, an optimal balance can still be achieved while pulling the exact same arms in two different rounds of executions.
arXiv Detail & Related papers (2022-10-04T20:36:45Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z) - Adversarial Estimation of Riesz Representers [21.510036777607397]
We propose an adversarial framework to estimate the Riesz representer using general function spaces.
We prove a nonasymptotic mean square rate in terms of an abstract quantity called the critical radius, then specialize it for neural networks, random forests, and reproducing kernel Hilbert spaces as leading cases.
arXiv Detail & Related papers (2020-12-30T19:46:57Z) - Relative Deviation Margin Bounds [55.22251993239944]
We give two types of learning bounds, both distribution-dependent and valid for general families, in terms of the Rademacher complexity.
We derive distribution-dependent generalization bounds for unbounded loss functions under the assumption of a finite moment.
arXiv Detail & Related papers (2020-06-26T12:37:17Z) - Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence
Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian
Rewards [10.66048003460524]
We study an extension of the classic multi-armed bandit problem which involves multiple plays and Markovian rewards in the rested bandits setting.
In order to tackle this problem we consider an adaptive allocation rule which at each stage combines the information from the sample means of all the arms, with the Kullback-Leibler upper confidence bound of a single arm which is selected in round-robin way.
arXiv Detail & Related papers (2020-01-30T08:09:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.