Allocating Divisible Resources on Arms with Unknown and Random Rewards
- URL: http://arxiv.org/abs/2306.16578v2
- Date: Fri, 3 Nov 2023 01:57:15 GMT
- Title: Allocating Divisible Resources on Arms with Unknown and Random Rewards
- Authors: Ningyuan Chen, Wenhao Li
- Abstract summary: We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms.
The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource.
- Score: 25.93048671326331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a decision maker allocating one unit of renewable and divisible
resource in each period on a number of arms. The arms have unknown and random
rewards whose means are proportional to the allocated resource and whose
variances are proportional to an order $b$ of the allocated resource. In
particular, if the decision maker allocates resource $A_i$ to arm $i$ in a
period, then the reward $Y_i$ is$Y_i(A_i)=A_i \mu_i+A_i^b \xi_{i}$, where
$\mu_i$ is the unknown mean and the noise $\xi_{i}$ is independent and
sub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothly
bridges the standard stochastic multi-armed bandit and online learning with
full feedback. We design two algorithms that attain the optimal gap-dependent
and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase
transition at $b=1/2$. The theoretical results hinge on a novel concentration
inequality we have developed that bounds a linear combination of sub-Gaussian
random variables whose weights are fractional, adapted to the filtration, and
monotonic.
Related papers
- Rényi divergence-based uniformity guarantees for $k$-universal hash functions [59.90381090395222]
Universal hash functions map the output of a source to random strings over a finite alphabet.
We show that it is possible to distill random bits that are nearly uniform, as measured by min-entropy.
arXiv Detail & Related papers (2024-10-21T19:37:35Z) - A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.
We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Generalized Regret Analysis of Thompson Sampling using Fractional
Posteriors [12.43000662545423]
Thompson sampling (TS) is one of the most popular and earliest algorithms to solve multi-armed bandit problems.
We consider a variant of TS, named $alpha$-TS, where we use a fractional or $alpha$-posterior instead of the standard posterior distribution.
arXiv Detail & Related papers (2023-09-12T16:15:33Z) - Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$.
In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret.
We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z) - Learning the optimal regularizer for inverse problems [1.763934678295407]
We consider the linear inverse problem $y=Ax+epsilon$, where $Acolon Xto Y$ is a known linear operator between the separable Hilbert spaces $X$ and $Y$.
This setting covers several inverse problems in imaging including denoising, deblurring, and X-ray tomography.
Within the classical framework of regularization, we focus on the case where the regularization functional is not given a priori but learned from data.
arXiv Detail & Related papers (2021-06-11T17:14:27Z) - Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms.
We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z) - Variance-Aware Confidence Set: Variance-Dependent Bound for Linear
Bandits and Horizon-Free Bound for Linear Mixture MDP [76.94328400919836]
We show how to construct variance-aware confidence sets for linear bandits and linear mixture Decision Process (MDP)
For linear bandits, we obtain an $widetildeO(mathrmpoly(d)sqrt1 + sum_i=1Ksigma_i2) regret bound, where $d is the feature dimension.
For linear mixture MDP, we obtain an $widetildeO(mathrmpoly(d)sqrtK)$ regret bound, where
arXiv Detail & Related papers (2021-01-29T18:57:52Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.