Linear Bandits with Limited Adaptivity and Learning Distributional
Optimal Design
- URL: http://arxiv.org/abs/2007.01980v3
- Date: Fri, 23 Apr 2021 13:01:03 GMT
- Title: Linear Bandits with Limited Adaptivity and Learning Distributional
Optimal Design
- Authors: Yufei Ruan, Jiaqi Yang, Yuan Zhou
- Abstract summary: We study the impact of adaptivity constraints to linear contextual bandits, a central problem in online active learning.
We show that, when the context vectors are adversarially chosen in $d$-dimensional linear contextual bandits, the learner needs $O(d log d log T)$ policy switches to achieve the minimax-optimal regret.
- Score: 12.465883735626605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by practical needs such as large-scale learning, we study the
impact of adaptivity constraints to linear contextual bandits, a central
problem in online active learning. We consider two popular limited adaptivity
models in literature: batch learning and rare policy switches. We show that,
when the context vectors are adversarially chosen in $d$-dimensional linear
contextual bandits, the learner needs $O(d \log d \log T)$ policy switches to
achieve the minimax-optimal regret, and this is optimal up to
$\mathrm{poly}(\log d, \log \log T)$ factors; for stochastic context vectors,
even in the more restricted batch learning model, only $O(\log \log T)$ batches
are needed to achieve the optimal regret. Together with the known results in
literature, our results present a complete picture about the adaptivity
constraints in linear contextual bandits. Along the way, we propose the
distributional optimal design, a natural extension of the optimal experiment
design, and provide a both statistically and computationally efficient learning
algorithm for the problem, which may be of independent interest.
Related papers
- Anytime Model Selection in Linear Bandits [61.97047189786905]
We develop ALEXP, which has an exponentially improved dependence on $M$ for its regret.
Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.
arXiv Detail & Related papers (2023-07-24T15:44:30Z) - Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - Oracle Inequalities for Model Selection in Offline Reinforcement
Learning [105.74139523696284]
We study the problem of model selection in offline RL with value function approximation.
We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal inequalities up to logarithmic factors.
We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.
arXiv Detail & Related papers (2022-11-03T17:32:34Z) - Efficient and Near-Optimal Smoothed Online Learning for Generalized
Linear Functions [28.30744223973527]
We give a computationally efficient algorithm that is the first to enjoy the statistically optimal log(T/sigma) regret for realizable K-wise linear classification.
We develop a novel characterization of the geometry of the disagreement region induced by generalized linear classifiers.
arXiv Detail & Related papers (2022-05-25T21:31:36Z) - Stochastic Contextual Dueling Bandits under Linear Stochastic
Transitivity Models [25.336599480692122]
We consider the regret minimization task in a dueling bandits problem with context information.
We propose a computationally efficient algorithm, $texttCoLSTIM$, which makes its choice based on imitating the feedback process.
Our experiments demonstrate its superiority over state-of-art algorithms for special cases of CoLST models.
arXiv Detail & Related papers (2022-02-09T17:44:19Z) - Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits.
We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z) - Pareto Optimal Model Selection in Linear Bandits [15.85873315624132]
We study a model selection problem in the linear bandit setting, where the learner must adapt to the dimension of the optimal hypothesis class on the fly.
In this paper, we first establish a lower bound showing that, even with a fixed action set, adaptation to the unknown intrinsic dimension $d_star$ comes at a cost.
arXiv Detail & Related papers (2021-02-12T16:02:06Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.