Universal and data-adaptive algorithms for model selection in linear
contextual bandits
- URL: http://arxiv.org/abs/2111.04688v1
- Date: Mon, 8 Nov 2021 18:05:35 GMT
- Title: Universal and data-adaptive algorithms for model selection in linear
contextual bandits
- Authors: Vidya Muthukumar, Akshay Krishnamurthy
- Abstract summary: We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
- Score: 52.47796554359261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model selection in contextual bandits is an important complementary problem
to regret minimization with respect to a fixed model class. We consider the
simplest non-trivial instance of model-selection: distinguishing a simple
multi-armed bandit problem from a linear contextual bandit problem. Even in
this instance, current state-of-the-art methods explore in a suboptimal manner
and require strong "feature-diversity" conditions. In this paper, we introduce
new algorithms that a) explore in a data-adaptive manner, and b) provide model
selection guarantees of the form $\mathcal{O}(d^{\alpha} T^{1- \alpha})$ with
no feature diversity conditions whatsoever, where $d$ denotes the dimension of
the linear model and $T$ denotes the total number of rounds. The first
algorithm enjoys a "best-of-both-worlds" property, recovering two prior results
that hold under distinct distributional assumptions, simultaneously. The second
removes distributional assumptions altogether, expanding the scope for
tractable model selection. Our approach extends to model selection among nested
linear contextual bandits under some additional assumptions.
Related papers
- Anytime Model Selection in Linear Bandits [61.97047189786905]
We develop ALEXP, which has an exponentially improved dependence on $M$ for its regret.
Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.
arXiv Detail & Related papers (2023-07-24T15:44:30Z) - Oracle Inequalities for Model Selection in Offline Reinforcement
Learning [105.74139523696284]
We study the problem of model selection in offline RL with value function approximation.
We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal inequalities up to logarithmic factors.
We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.
arXiv Detail & Related papers (2022-11-03T17:32:34Z) - Combinatorial Causal Bandits [25.012065471684025]
In causal bandits, the learning agent chooses at most $K$ variables in each round to intervene, with the goal of minimizing expected regret on the target variable $Y$.
We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models.
We present the algorithm BGLM-OFU for Markovian BGLMs based on the maximum likelihood estimation method, and show that it achieves $O(sqrtTlog T)$ regret, where $T$ is the time horizon.
arXiv Detail & Related papers (2022-06-04T14:14:58Z) - Fast Feature Selection with Fairness Constraints [49.142308856826396]
We study the fundamental problem of selecting optimal features for model construction.
This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants.
We extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions.
The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work.
arXiv Detail & Related papers (2022-02-28T12:26:47Z) - Near Instance Optimal Model Selection for Pure Exploration Linear
Bandits [20.67688737534517]
We study the model selection problem in the pure exploration linear bandit setting.
Our goal is to automatically adapt to the instance-dependent complexity measure of the smallest hypothesis class.
Our algorithms define a new optimization problem based on experimental design.
arXiv Detail & Related papers (2021-09-10T22:56:58Z) - Model Selection for Generic Contextual Bandits [20.207989166682832]
We propose a refinement based algorithm called Adaptive Contextual Bandit (ttfamily ACB)
We prove that this algorithm is adaptive, i.e., the regret rate order-wise matches that of any provable contextual bandit algorithm.
We also show that a much simpler explore-then-commit (ETC) style algorithm also obtains similar regret bound, despite not knowing the true model class.
arXiv Detail & Related papers (2021-07-07T19:35:31Z) - Towards Costless Model Selection in Contextual Bandits: A Bias-Variance
Perspective [7.318831153179727]
We study the feasibility of similar guarantees for cumulative regret minimization in the contextual bandit setting.
Our algorithm is based on a novel misspecification test, and our analysis demonstrates the benefits of using model selection for reward estimation.
arXiv Detail & Related papers (2021-06-11T16:08:03Z) - Pareto Optimal Model Selection in Linear Bandits [15.85873315624132]
We study a model selection problem in the linear bandit setting, where the learner must adapt to the dimension of the optimal hypothesis class on the fly.
In this paper, we first establish a lower bound showing that, even with a fixed action set, adaptation to the unknown intrinsic dimension $d_star$ comes at a cost.
arXiv Detail & Related papers (2021-02-12T16:02:06Z) - Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward.
For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.