Bandit Phase Retrieval
- URL: http://arxiv.org/abs/2106.01660v2
- Date: Fri, 4 Jun 2021 15:52:52 GMT
- Title: Bandit Phase Retrieval
- Authors: Tor Lattimore, Botao Hao
- Abstract summary: We prove that the minimax cumulative regret in this problem is $smashtilde Theta(d sqrtn)$.
We also show that the minimax simple regret is $smashtilde Theta(d / sqrtn)$ and that this is only achievable by an adaptive algorithm.
- Score: 45.12888201134656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a bandit version of phase retrieval where the learner chooses
actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected
reward is $\langle A_t, \theta_\star\rangle^2$ where $\theta_\star \in \mathbb
R^d$ is an unknown parameter vector. We prove that the minimax cumulative
regret in this problem is $\smash{\tilde \Theta(d \sqrt{n})}$, which improves
on the best known bounds by a factor of $\smash{\sqrt{d}}$. We also show that
the minimax simple regret is $\smash{\tilde \Theta(d / \sqrt{n})}$ and that
this is only achievable by an adaptive algorithm. Our analysis shows that an
apparently convincing heuristic for guessing lower bounds can be misleading and
that uniform bounds on the information ratio for information-directed sampling
are not sufficient for optimal regret.
Related papers
- How Does Variance Shape the Regret in Contextual Bandits? [59.8472760881411]
We consider contextual bandits with general function approximation.
We show that the eluder dimension $d_textelu$$-$plays a crucial role in variance-dependent bounds.
arXiv Detail & Related papers (2024-10-16T16:20:07Z) - Low-rank Matrix Bandits with Heavy-tailed Rewards [55.03293214439741]
We study the problem of underlinelow-rank matrix bandit with underlineheavy-underlinetailed underlinerewards (LowHTR)
By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS.
arXiv Detail & Related papers (2024-04-26T21:54:31Z) - Context-lumpable stochastic bandits [49.024050919419366]
We consider a contextual bandit problem with $S$ contexts and $K$ actions.
We give an algorithm that outputs an $epsilon$-optimal policy after using at most $widetilde O(r (S +K )/epsilon2)$ samples.
In the regret setting, we give an algorithm whose cumulative regret up to time $T$ is bounded by $widetilde O(sqrtr3(S+K)T)$.
arXiv Detail & Related papers (2023-06-22T17:20:30Z) - Sparse Recovery with Shuffled Labels: Statistical Limits and Practical
Estimators [23.313461266708877]
We reconstruct the permutation matrix $bPitrue$ and the sparse signal $bbetatrue$ from shuffled labels.
We show that our proposed estimator can obtain the ground-truth $(bPitrue, supp(bbetatrue))$ under mild conditions.
arXiv Detail & Related papers (2023-03-20T16:14:58Z) - Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR [58.40575099910538]
We study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $tau$.
We show the minimax CVaR regret rate is $Omega(sqrttau-1AK)$, where $A$ is the number of actions and $K$ is the number of episodes.
We show that our algorithm achieves the optimal regret of $widetilde O(tau-1sqrtSAK)$ under a continuity assumption and in general attains a near
arXiv Detail & Related papers (2023-02-07T02:22:31Z) - Sparse Signal Detection in Heteroscedastic Gaussian Sequence Models:
Sharp Minimax Rates [1.0309387309011746]
We study the signal detection problem against sparse alternatives, for known sparsity $s$.
We find minimax upper and lower bounds over the minimax separation radius $epsilon*$ and prove that they are always matching.
Our results reveal new phase transitions regarding the behavior of $epsilon*$ with respect to the level of sparsity, to the $Lt$ metric, and to the heteroscedasticity profile of $Sigma$.
arXiv Detail & Related papers (2022-11-15T23:53:39Z) - Explicit Best Arm Identification in Linear Bandits Using No-Regret
Learners [17.224805430291177]
We study the problem of best arm identification in linearly parameterised multi-armed bandits.
We propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem.
arXiv Detail & Related papers (2020-06-13T05:00:01Z) - Problem-Complexity Adaptive Model Selection for Stochastic Linear
Bandits [20.207989166682832]
We consider the problem of model selection for two popular linear bandit settings.
In the first setting, we consider the $K$ armed mixture bandits, where the mean reward of arm $i in [K]$ is $mu_i+ langle alpha_i,t,theta*|$.
We show that ALB achieves regret scaling of $O(|theta*|sqrtT)$, where $|theta*|$ is apriori unknown
arXiv Detail & Related papers (2020-06-04T02:19:00Z) - Near-optimal Regret Bounds for Stochastic Shortest Path [63.029132134792555]
shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost.
We show that any learning algorithm must have at least $Omega(B_star sqrt|S| |A| K)$ regret in the worst case.
arXiv Detail & Related papers (2020-02-23T09:10:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.