Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection
- URL: http://arxiv.org/abs/2009.02003v1
- Date: Fri, 4 Sep 2020 04:10:39 GMT
- Title: Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection
- Authors: Yining Wang, Yi Chen, Ethan X. Fang, Zhaoran Wang and Runze Li
- Abstract summary: We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
- Score: 71.9765117768556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the stochastic contextual bandit problem under the high
dimensional linear model. We focus on the case where the action space is finite
and random, with each action associated with a randomly generated contextual
covariate. This setting finds essential applications such as personalized
recommendation, online advertisement, and personalized medicine. However, it is
very challenging as we need to balance exploration and exploitation. We propose
doubly growing epochs and estimating the parameter using the best subset
selection method, which is easy to implement in practice. This approach
achieves $ \tilde{\mathcal{O}}(s\sqrt{T})$ regret with high probability, which
is nearly independent in the ``ambient'' regression model dimension $d$. We
further attain a sharper $\tilde{\mathcal{O}}(\sqrt{sT})$ regret by using the
\textsc{SupLinUCB} framework and match the minimax lower bound of
low-dimensional linear stochastic bandit problems. Finally, we conduct
extensive numerical experiments to demonstrate the applicability and robustness
of our algorithms empirically.
Related papers
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - PopArt: Efficient Sparse Regression and Experimental Design for Optimal
Sparse Linear Bandits [29.097522376094624]
We propose a simple and computationally efficient sparse linear estimation method called PopArt.
We derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art.
arXiv Detail & Related papers (2022-10-25T19:13:20Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Exploration in Linear Bandits with Rich Action Sets and its Implications
for Inference [23.364534479492715]
We show that the minimum eigenvalue of the expected matrix grows as $Omega(sqrtn) whenever the expected cumulative regret of the algorithm is $sqrtn)$.
We apply our result to two practical scenarios -- emphmodel selection and emphclustering in linear bandits.
arXiv Detail & Related papers (2022-07-23T20:25:07Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Stochastic Linear Bandits with Protected Subspace [51.43660657268171]
We study a variant of the linear bandit problem wherein we optimize a linear objective function but rewards are accrued only to an unknown subspace.
In particular, at each round, the learner must choose whether to query the objective or the protected subspace alongside choosing an action.
Our algorithm, derived from the OFUL principle, uses some of the queries to get an estimate of the protected space.
arXiv Detail & Related papers (2020-11-02T14:59:39Z) - Online and Distribution-Free Robustness: Regression and Contextual
Bandits with Huber Contamination [29.85468294601847]
We revisit two classic high-dimensional online learning problems, namely linear regression and contextual bandits.
We show that our algorithms succeed where conventional methods fail.
arXiv Detail & Related papers (2020-10-08T17:59:05Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.