Taking a hint: How to leverage loss predictors in contextual bandits?
- URL: http://arxiv.org/abs/2003.01922v2
- Date: Thu, 15 Oct 2020 17:31:50 GMT
- Title: Taking a hint: How to leverage loss predictors in contextual bandits?
- Authors: Chen-Yu Wei, Haipeng Luo, Alekh Agarwal
- Abstract summary: We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
- Score: 63.546913998407405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We initiate the study of learning in contextual bandits with the help of loss
predictors. The main question we address is whether one can improve over the
minimax regret $\mathcal{O}(\sqrt{T})$ for learning over $T$ rounds, when the
total error of the predictor $\mathcal{E} \leq T$ is relatively small. We
provide a complete answer to this question, including upper and lower bounds
for various settings: adversarial versus stochastic environments, known versus
unknown $\mathcal{E}$, and single versus multiple predictors. We show several
surprising results, such as 1) the optimal regret is
$\mathcal{O}(\min\{\sqrt{T}, \sqrt{\mathcal{E}}T^\frac{1}{4}\})$ when
$\mathcal{E}$ is known, a sharp contrast to the standard and better bound
$\mathcal{O}(\sqrt{\mathcal{E}})$ for non-contextual problems (such as
multi-armed bandits); 2) the same bound cannot be achieved if $\mathcal{E}$ is
unknown, but as a remedy, $\mathcal{O}(\sqrt{\mathcal{E}}T^\frac{1}{3})$ is
achievable; 3) with $M$ predictors, a linear dependence on $M$ is necessary,
even if logarithmic dependence is possible for non-contextual problems.
We also develop several novel algorithmic techniques to achieve matching
upper bounds, including 1) a key action remapping technique for optimal regret
with known $\mathcal{E}$, 2) implementing Catoni's robust mean estimator
efficiently via an ERM oracle leading to an efficient algorithm in the
stochastic setting with optimal regret, 3) constructing an underestimator for
$\mathcal{E}$ via estimating the histogram with bins of exponentially
increasing size for the stochastic setting with unknown $\mathcal{E}$, and 4) a
self-referential scheme for learning with multiple predictors, all of which
might be of independent interest.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Batched Stochastic Bandit for Nondegenerate Functions [8.015503209312786]
This paper studies batched bandit learning problems for nondegenerate functions.
We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally.
arXiv Detail & Related papers (2024-05-09T12:50:16Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Optimism in Face of a Context: Regret Guarantees for Stochastic
Contextual MDP [46.86114958340962]
We present regret algorithms for contextual MDPs under minimum reachability assumption.
Our approach is the first optimistic approach applied to contextual MDPs with general function approximation.
arXiv Detail & Related papers (2022-07-22T15:00:15Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal
Predictions [37.92747047688873]
We show that the commonly-used $tau=1$ can be insufficient to drive good decisions in many settings of interest.
We also show that, as $tau$ grows, performing well according to $mathbfd_mathrmKLtau$ recovers universal guarantees for any possible decision.
arXiv Detail & Related papers (2021-07-20T01:55:01Z) - Improved Regret Bounds for Online Submodular Maximization [10.089520556398575]
We consider an online optimization problem where at each step $tin[T]$, the algorithm chooses an action $x_t$ from the fixed convex and compact domain set $mathcalK$.
A utility function $f_t(cdot)$ is then revealed and the algorithm receives the payoff $f_t(x_t)$.
arXiv Detail & Related papers (2021-06-15T02:05:35Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support.
We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.