Dimension Reduction in Contextual Online Learning via Nonparametric
Variable Selection
- URL: http://arxiv.org/abs/2009.08265v2
- Date: Mon, 3 Oct 2022 03:38:16 GMT
- Title: Dimension Reduction in Contextual Online Learning via Nonparametric
Variable Selection
- Authors: Wenhao Li, Ningyuan Chen, L. Jeff Hong
- Abstract summary: The literature has shown that the optimal regret is $tildeO(T(d_x*+d_y+1)/(d_x*+d_y+2))$.
We propose a variable selection algorithm called textitBV-LASSO, which incorporates novel ideas such as binning and voting to apply LASSO to nonparametric settings.
- Score: 8.250528027387196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a contextual online learning (multi-armed bandit) problem with
high-dimensional covariate $\mathbf{x}$ and decision $\mathbf{y}$. The reward
function to learn, $f(\mathbf{x},\mathbf{y})$, does not have a particular
parametric form. The literature has shown that the optimal regret is
$\tilde{O}(T^{(d_x+d_y+1)/(d_x+d_y+2)})$, where $d_x$ and $d_y$ are the
dimensions of $\mathbf x$ and $\mathbf y$, and thus it suffers from the curse
of dimensionality. In many applications, only a small subset of variables in
the covariate affect the value of $f$, which is referred to as
\textit{sparsity} in statistics. To take advantage of the sparsity structure of
the covariate, we propose a variable selection algorithm called
\textit{BV-LASSO}, which incorporates novel ideas such as binning and voting to
apply LASSO to nonparametric settings. Our algorithm achieves the regret
$\tilde{O}(T^{(d_x^*+d_y+1)/(d_x^*+d_y+2)})$, where $d_x^*$ is the effective
covariate dimension. The regret matches the optimal regret when the covariate
is $d^*_x$-dimensional and thus cannot be improved. Our algorithm may serve as
a general recipe to achieve dimension reduction via variable selection in
nonparametric settings.
Related papers
- Online Learning of Halfspaces with Massart Noise [47.71073318490341]
We study the task of online learning in the presence of Massart noise.
We present a computationally efficient algorithm that achieves mistake bound $eta T + o(T)$.
We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k) Delta T - o(T)$ bigger than choosing a random action at every round.
arXiv Detail & Related papers (2024-05-21T17:31:10Z) - Computationally Efficient Horizon-Free Reinforcement Learning for Linear
Mixture MDPs [111.75736569611159]
We propose the first computationally efficient horizon-free algorithm for linear mixture MDPs.
Our algorithm adapts a weighted least square estimator for the unknown transitional dynamic.
This also improves upon the best-known algorithms in this setting when $sigma_k2$'s are known.
arXiv Detail & Related papers (2022-05-23T17:59:18Z) - Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems.
We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z) - Combinatorial Semi-Bandit in the Non-Stationary Environment [27.394284055156795]
We investigate the non-stationary semi-bandit problem, both in the switching case and in the dynamic case.
By employing another technique, our algorithm no longer needs to know the parameters $mathcal S$ or $mathcal V$ but the regret bounds could become suboptimal.
arXiv Detail & Related papers (2020-02-10T07:10:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.