High-Dimensional Experimental Design and Kernel Bandits
- URL: http://arxiv.org/abs/2105.05806v1
- Date: Wed, 12 May 2021 17:10:56 GMT
- Title: High-Dimensional Experimental Design and Kernel Bandits
- Authors: Romain Camilleri and Julian Katz-Samuels and Kevin Jamieson
- Abstract summary: Methods from optimal linear experimental design have been leveraged to obtain state of the art results for linear bandits.
A design returned from an objective such as $G$-optimal design is actually a probability distribution over a pool of potential measurement vectors.
We propose a rounding procedure that frees $N$ of any dependence on the dimension $d$.
- Score: 9.401375475860561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years methods from optimal linear experimental design have been
leveraged to obtain state of the art results for linear bandits. A design
returned from an objective such as $G$-optimal design is actually a probability
distribution over a pool of potential measurement vectors. Consequently, one
nuisance of the approach is the task of converting this continuous probability
distribution into a discrete assignment of $N$ measurements. While
sophisticated rounding techniques have been proposed, in $d$ dimensions they
require $N$ to be at least $d$, $d \log(\log(d))$, or $d^2$ based on the
sub-optimality of the solution. In this paper we are interested in settings
where $N$ may be much less than $d$, such as in experimental design in an RKHS
where $d$ may be effectively infinite. In this work, we propose a rounding
procedure that frees $N$ of any dependence on the dimension $d$, while
achieving nearly the same performance guarantees of existing rounding
procedures. We evaluate the procedure against a baseline that projects the
problem to a lower dimensional space and performs rounding which requires $N$
to just be at least a notion of the effective dimension. We also leverage our
new approach in a new algorithm for kernelized bandits to obtain state of the
art results for regret minimization and pure exploration. An advantage of our
approach over existing UCB-like approaches is that our kernel bandit algorithms
are also robust to model misspecification.
Related papers
- Regret Minimization and Statistical Inference in Online Decision Making with High-dimensional Covariates [7.21848268647674]
We integrate the $varepsilon$-greedy bandit algorithm for decision-making with a hard thresholding algorithm for estimating sparse bandit parameters.
Under a margin condition, our method achieves either $O(T1/2)$ regret or classical $O(T1/2)$-consistent inference.
arXiv Detail & Related papers (2024-11-10T01:47:11Z) - Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery [45.601316850669406]
We present efficient algorithms for policy evaluation, best policy identification and regret minimization.
For policy evaluation and best policy identification, we show that our algorithms are nearly minimax optimal.
All the proposed algorithms consist of two phases: they first leverage spectral methods to estimate the left and right singular subspaces of the low-rank reward matrix.
arXiv Detail & Related papers (2024-02-24T06:36:08Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Stochastic Linear Bandits with Protected Subspace [51.43660657268171]
We study a variant of the linear bandit problem wherein we optimize a linear objective function but rewards are accrued only to an unknown subspace.
In particular, at each round, the learner must choose whether to query the objective or the protected subspace alongside choosing an action.
Our algorithm, derived from the OFUL principle, uses some of the queries to get an estimate of the protected space.
arXiv Detail & Related papers (2020-11-02T14:59:39Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.