Related papers: Bandit Phase Retrieval

Bandit Phase Retrieval

URL: http://arxiv.org/abs/2106.01660v2
Date: Fri, 4 Jun 2021 15:52:52 GMT
Title: Bandit Phase Retrieval
Authors: Tor Lattimore, Botao Hao
Abstract summary: We prove that the minimax cumulative regret in this problem is $smashtilde Theta(d sqrtn)$. We also show that the minimax simple regret is $smashtilde Theta(d / sqrtn)$ and that this is only achievable by an adaptive algorithm.
Score: 45.12888201134656
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $\langle A_t, \theta_\star\rangle^2$ where $\theta_\star \in \mathbb R^d$ is an unknown parameter vector. We prove that the minimax cumulative regret in this problem is $\smash{\tilde \Theta(d \sqrt{n})}$, which improves on the best known bounds by a factor of $\smash{\sqrt{d}}$. We also show that the minimax simple regret is $\smash{\tilde \Theta(d / \sqrt{n})}$ and that this is only achievable by an adaptive algorithm. Our analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling are not sufficient for optimal regret.

Related papers

Variance-Dependent Regret Lower Bounds for Contextual Bandits [65.93854043353328]
Variance-dependent regret bounds for linear contextual bandits, which improve upon the classical $tildeO(dsqrtK)$ regret bound to $tildeO(dsqrtsum_k=1Ksigma_k2)$.
arXiv Detail & Related papers (2025-03-15T07:09:36Z)
Near-Optimal Sample Complexity for Iterated CVaR Reinforcement Learning with a Generative Model [13.582475656749775]
We study the sample complexity problem of risk-sensitive Reinforcement Learning (RL) with a generative model.<n>We establish nearly matching upper and lower bounds on the sample complexity of this problem.
arXiv Detail & Related papers (2025-03-11T22:31:03Z)
How Does Variance Shape the Regret in Contextual Bandits? [59.8472760881411]
We consider contextual bandits with general function approximation. We show that the eluder dimension $d_textelu$$-$plays a crucial role in variance-dependent bounds.
arXiv Detail & Related papers (2024-10-16T16:20:07Z)
Low-rank Matrix Bandits with Heavy-tailed Rewards [55.03293214439741]
We study the problem of underlinelow-rank matrix bandit with underlineheavy-underlinetailed underlinerewards (LowHTR) By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS.
arXiv Detail & Related papers (2024-04-26T21:54:31Z)
Context-lumpable stochastic bandits [49.024050919419366]
We consider a contextual bandit problem with $S$ contexts and $K$ actions. We give an algorithm that outputs an $epsilon$-optimal policy after using at most $widetilde O(r (S +K )/epsilon2)$ samples. In the regret setting, we give an algorithm whose cumulative regret up to time $T$ is bounded by $widetilde O(sqrtr3(S+K)T)$.
arXiv Detail & Related papers (2023-06-22T17:20:30Z)
Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
We show that our algorithm achieves an $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ are the cardinalities of the state and action spaces
arXiv Detail & Related papers (2023-05-15T05:37:32Z)
Sparse Recovery with Shuffled Labels: Statistical Limits and Practical Estimators [23.313461266708877]
We reconstruct the permutation matrix $bPitrue$ and the sparse signal $bbetatrue$ from shuffled labels. We show that our proposed estimator can obtain the ground-truth $(bPitrue, supp(bbetatrue))$ under mild conditions.
arXiv Detail & Related papers (2023-03-20T16:14:58Z)
Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR [58.40575099910538]
We study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $tau$. We show the minimax CVaR regret rate is $Omega(sqrttau-1AK)$, where $A$ is the number of actions and $K$ is the number of episodes. We show that our algorithm achieves the optimal regret of $widetilde O(tau-1sqrtSAK)$ under a continuity assumption and in general attains a near
arXiv Detail & Related papers (2023-02-07T02:22:31Z)
Sparse Signal Detection in Heteroscedastic Gaussian Sequence Models: Sharp Minimax Rates [1.0309387309011746]
We study the signal detection problem against sparse alternatives, for known sparsity $s$. We find minimax upper and lower bounds over the minimax separation radius $epsilon*$ and prove that they are always matching. Our results reveal new phase transitions regarding the behavior of $epsilon*$ with respect to the level of sparsity, to the $Lt$ metric, and to the heteroscedasticity profile of $Sigma$.
arXiv Detail & Related papers (2022-11-15T23:53:39Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners [17.224805430291177]
We study the problem of best arm identification in linearly parameterised multi-armed bandits. We propose an explicitly implementable and provably order-optimal sample-complexity algorithm to solve this problem.
arXiv Detail & Related papers (2020-06-13T05:00:01Z)
Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits [20.207989166682832]
We consider the problem of model selection for two popular linear bandit settings. In the first setting, we consider the $K$ armed mixture bandits, where the mean reward of arm $i in [K]$ is $mu_i+ langle alpha_i,t,theta*|$. We show that ALB achieves regret scaling of $O(|theta*|sqrtT)$, where $|theta*|$ is apriori unknown
arXiv Detail & Related papers (2020-06-04T02:19:00Z)
Near-optimal Regret Bounds for Stochastic Shortest Path [63.029132134792555]
shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. We show that any learning algorithm must have at least $Omega(B_star sqrt|S| |A| K)$ regret in the worst case.
arXiv Detail & Related papers (2020-02-23T09:10:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.