Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed
Rewards
- URL: http://arxiv.org/abs/2310.18701v1
- Date: Sat, 28 Oct 2023 13:01:10 GMT
- Title: Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed
Rewards
- Authors: Bo Xue, Yimu Wang, Yuanyu Wan, Jinfeng Yi, Lijun Zhang
- Abstract summary: We propose two novel algorithms based on truncation and mean of medians.
Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches.
Our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $epsilon=1$.
- Score: 40.99322897009357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the problem of generalized linear bandits with
heavy-tailed rewards, whose $(1+\epsilon)$-th moment is bounded for some
$\epsilon\in (0,1]$. Although there exist methods for generalized linear
bandits, most of them focus on bounded or sub-Gaussian rewards and are not
well-suited for many real-world scenarios, such as financial markets and
web-advertising. To address this issue, we propose two novel algorithms based
on truncation and mean of medians. These algorithms achieve an almost optimal
regret bound of $\widetilde{O}(dT^{\frac{1}{1+\epsilon}})$, where $d$ is the
dimension of contextual information and $T$ is the time horizon. Our
truncation-based algorithm supports online learning, distinguishing it from
existing truncation-based approaches. Additionally, our mean-of-medians-based
algorithm requires only $O(\log T)$ rewards and one estimator per epoch, making
it more practical. Moreover, our algorithms improve the regret bounds by a
logarithmic factor compared to existing algorithms when $\epsilon=1$. Numerical
experimental results confirm the merits of our algorithms.
Related papers
- Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery [45.601316850669406]
We present efficient algorithms for policy evaluation, best policy identification and regret minimization.
For policy evaluation and best policy identification, we show that our algorithms are nearly minimax optimal.
All the proposed algorithms consist of two phases: they first leverage spectral methods to estimate the left and right singular subspaces of the low-rank reward matrix.
arXiv Detail & Related papers (2024-02-24T06:36:08Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Quick Streaming Algorithms for Maximization of Monotone Submodular
Functions in Linear Time [16.346069386394703]
We consider the problem of monotone, submodular over a ground set of size $n$ subject to cardinality constraint $k$.
We introduce the first deterministic algorithms with linear time complexity; these algorithms are streaming algorithms.
Our single-pass algorithm obtains a constant ratio in $lceil n / c rceil + c$ oracle queries, for any $c ge 1$.
arXiv Detail & Related papers (2020-09-10T16:35:54Z) - Low-Rank Generalized Linear Bandit Problems [19.052901817304743]
In a low-rank linear bandit problem, the reward of an action is the inner product between the action and an unknown low-rank matrix $Theta*$.
We propose an algorithm based on a novel combination of online-to-confidence-set conversioncitepabbasi2012online and the exponentially weighted average forecaster constructed by a covering of low-rank matrices.
arXiv Detail & Related papers (2020-06-04T15:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.