Allocating Divisible Resources on Arms with Unknown and Random Rewards
- URL: http://arxiv.org/abs/2306.16578v2
- Date: Fri, 3 Nov 2023 01:57:15 GMT
- Title: Allocating Divisible Resources on Arms with Unknown and Random Rewards
- Authors: Ningyuan Chen, Wenhao Li
- Abstract summary: We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms.
The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource.
- Score: 25.93048671326331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a decision maker allocating one unit of renewable and divisible
resource in each period on a number of arms. The arms have unknown and random
rewards whose means are proportional to the allocated resource and whose
variances are proportional to an order $b$ of the allocated resource. In
particular, if the decision maker allocates resource $A_i$ to arm $i$ in a
period, then the reward $Y_i$ is$Y_i(A_i)=A_i \mu_i+A_i^b \xi_{i}$, where
$\mu_i$ is the unknown mean and the noise $\xi_{i}$ is independent and
sub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothly
bridges the standard stochastic multi-armed bandit and online learning with
full feedback. We design two algorithms that attain the optimal gap-dependent
and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase
transition at $b=1/2$. The theoretical results hinge on a novel concentration
inequality we have developed that bounds a linear combination of sub-Gaussian
random variables whose weights are fractional, adapted to the filtration, and
monotonic.
Related papers
- Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Generalized Regret Analysis of Thompson Sampling using Fractional
Posteriors [12.43000662545423]
Thompson sampling (TS) is one of the most popular and earliest algorithms to solve multi-armed bandit problems.
We consider a variant of TS, named $alpha$-TS, where we use a fractional or $alpha$-posterior instead of the standard posterior distribution.
arXiv Detail & Related papers (2023-09-12T16:15:33Z) - Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$.
In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret.
We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z) - Learning the optimal regularizer for inverse problems [1.763934678295407]
We consider the linear inverse problem $y=Ax+epsilon$, where $Acolon Xto Y$ is a known linear operator between the separable Hilbert spaces $X$ and $Y$.
This setting covers several inverse problems in imaging including denoising, deblurring, and X-ray tomography.
Within the classical framework of regularization, we focus on the case where the regularization functional is not given a priori but learned from data.
arXiv Detail & Related papers (2021-06-11T17:14:27Z) - Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms.
We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z) - Variance-Aware Confidence Set: Variance-Dependent Bound for Linear
Bandits and Horizon-Free Bound for Linear Mixture MDP [76.94328400919836]
We show how to construct variance-aware confidence sets for linear bandits and linear mixture Decision Process (MDP)
For linear bandits, we obtain an $widetildeO(mathrmpoly(d)sqrt1 + sum_i=1Ksigma_i2) regret bound, where $d is the feature dimension.
For linear mixture MDP, we obtain an $widetildeO(mathrmpoly(d)sqrtK)$ regret bound, where
arXiv Detail & Related papers (2021-01-29T18:57:52Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.