Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual
Bandits
- URL: http://arxiv.org/abs/2008.11918v4
- Date: Sat, 16 Jul 2022 21:51:55 GMT
- Title: Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual
Bandits
- Authors: Zhimei Ren and Zhengyuan Zhou
- Abstract summary: We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits.
Our work provides the first inroad into a theoretical understanding of dynamic batch learning in high-dimensional sparse linear contextual bandits.
- Score: 18.64677502651614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of dynamic batch learning in high-dimensional sparse
linear contextual bandits, where a decision maker, under a given
maximum-number-of-batch constraint and only able to observe rewards at the end
of each batch, can dynamically decide how many individuals to include in the
next batch (at the end of the current batch) and what personalized
action-selection scheme to adopt within each batch. Such batch constraints are
ubiquitous in a variety of practical contexts, including personalized product
offerings in marketing and medical treatment selection in clinical trials. We
characterize the fundamental learning limit in this problem via a regret lower
bound and provide a matching upper bound (up to log factors), thus prescribing
an optimal scheme for this problem. To the best of our knowledge, our work
provides the first inroad into a theoretical understanding of dynamic batch
learning in high-dimensional sparse linear contextual bandits. Notably, even a
special case of our result -- when no batch constraint is present -- yields
that the simple exploration-free algorithm using the LASSO estimator already
achieves the minimax optimal regret bound for standard online learning in
high-dimensional linear contextual bandits (for the no-margin case), a result
that appears unknown in the emerging literature of high-dimensional contextual
bandits.
Related papers
- Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints [8.784438985280094]
We study problems as pure exploration in multi-armed bandits with unknown linear constraints.
First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration under constraints.
Second, we leverage the Lagrangian lower bound and the properties of convex to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorer, namely LATS and LAGEX.
arXiv Detail & Related papers (2024-10-24T15:26:14Z) - Batched Nonparametric Contextual Bandits [21.031965676746776]
We study nonparametric contextual bandits under batch constraints.
We propose a novel batch learning algorithm that achieves the optimal regret.
Our theoretical results suggest that for nonparametric contextual bandits, a nearly constant number of policy updates can attain optimal regret.
arXiv Detail & Related papers (2024-02-27T18:06:20Z) - Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery [76.63807209414789]
We challenge the status quo in class-iNCD and propose a learning paradigm where class discovery occurs continuously and truly unsupervisedly.
We propose simple baselines, composed of a frozen PTM backbone and a learnable linear classifier, that are not only simple to implement but also resilient under longer learning scenarios.
arXiv Detail & Related papers (2023-03-28T13:47:16Z) - Oracle-Efficient Smoothed Online Learning for Piecewise Continuous Decision Making [73.48977854003697]
This work introduces a new notion of complexity, the generalized bracketing numbers, which marries constraints on the adversary to the size of the space.
We then instantiate our bounds in several problems of interest, including online prediction and planning of piecewise continuous functions.
arXiv Detail & Related papers (2023-02-10T18:45:52Z) - Adversarial Robustness with Semi-Infinite Constrained Learning [177.42714838799924]
Deep learning to inputs perturbations has raised serious questions about its use in safety-critical domains.
We propose a hybrid Langevin Monte Carlo training approach to mitigate this issue.
We show that our approach can mitigate the trade-off between state-of-the-art performance and robust robustness.
arXiv Detail & Related papers (2021-10-29T13:30:42Z) - Parallelizing Contextual Linear Bandits [82.65675585004448]
We present a family of (parallel) contextual linear bandit algorithms, whose regret is nearly identical to their perfectly sequential counterparts.
We also present an empirical evaluation of these parallel algorithms in several domains, including materials discovery and biological sequence design problems.
arXiv Detail & Related papers (2021-05-21T22:22:02Z) - Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff.
We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z) - Regret Bound Balancing and Elimination for Model Selection in Bandits
and RL [34.15978290083909]
We propose a simple model selection approach for algorithms in bandit and reinforcement learning problems.
We prove that the total regret of this approach is bounded by the best valid candidate regret bound times a multiplicative factor.
Unlike recent efforts in model selection for linear bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment.
arXiv Detail & Related papers (2020-12-24T00:53:42Z) - Sequential Batch Learning in Finite-Action Linear Contextual Bandits [40.01661188919779]
We study the sequential batch learning problem in linear contextual bandits with finite action sets.
This problem provides a finer-grained formulation of many personalized sequential decision making problems in practical applications.
arXiv Detail & Related papers (2020-04-14T06:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.