Oracle-Efficient Online Learning for Beyond Worst-Case Adversaries
- URL: http://arxiv.org/abs/2202.08549v1
- Date: Thu, 17 Feb 2022 09:49:40 GMT
- Title: Oracle-Efficient Online Learning for Beyond Worst-Case Adversaries
- Authors: Nika Haghtalab, Yanjun Han, Abhishek Shetty, Kunhe Yang
- Abstract summary: We study oracle-efficient algorithms for beyond worst-case analysis of online learning.
For the smoothed analysis setting, our results give the first oracle-efficient algorithm for online learning with smoothed adversaries.
- Score: 29.598518028635716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study oracle-efficient algorithms for beyond worst-case
analysis of online learning. We focus on two settings. First, the smoothed
analysis setting of [RST11, HRS12] where an adversary is constrained to
generating samples from distributions whose density is upper bounded by
$1/\sigma$ times the uniform density. Second, the setting of $K$-hint
transductive learning, where the learner is given access to $K$ hints per time
step that are guaranteed to include the true instance. We give the first known
oracle-efficient algorithms for both settings that depend only on the VC
dimension of the class and parameters $\sigma$ and $K$ that capture the power
of the adversary. In particular, we achieve oracle-efficient regret bounds of $
O ( \sqrt{T (d / \sigma )^{1/2} } ) $ and $ O ( \sqrt{T d K } )$ respectively
for these setting. For the smoothed analysis setting, our results give the
first oracle-efficient algorithm for online learning with smoothed adversaries
[HRS21]. This contrasts the computational separation between online learning
with worst-case adversaries and offline learning established by [HK16]. Our
algorithms also imply improved bounds for worst-case setting with small
domains. In particular, we give an oracle-efficient algorithm with regret of $O
( \sqrt{T(d \vert{\mathcal{X}}\vert ) ^{1/2} })$, which is a refinement of the
earlier $O ( \sqrt{T\vert{\mathcal{X} } \vert })$ bound by [DS16].
Related papers
- Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning [6.236688509180343]
We study off-dynamics Reinforcement Learning (RL), where the policy training and deployment environments are different.
We propose a novel algorithm We-DRIVE-U that enjoys an average suboptimality $widetildemathcalObig(d H cdot min 1/rho, H/sqrtK big)$.
We also construct a novel hard instance and derive the first information-theoretic lower bound in this setting.
arXiv Detail & Related papers (2024-09-30T17:21:15Z) - Oracle-Efficient Hybrid Online Learning with Unknown Distribution [16.73555330791045]
We show that there exists a computationally efficient online predictor that a regret upper bounded by $tildeO(Tfrac34)$ for a finite-VC class, and upper bounded by $tildeO(Tfracp+1p+2)$ for a class with $alpha$ fat-shattering dimension.
We then extend our result to the scenario of shifting distributions with $K$ changes, yielding a regret of order $tildeO(Tfrac45Kfrac15
arXiv Detail & Related papers (2024-01-27T22:45:02Z) - Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits.
Our algorithms deliver near-optimal regret bounds in both the adversarial and adversarial regimes.
arXiv Detail & Related papers (2023-12-24T08:27:30Z) - Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
We show that our algorithm achieves an $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ are the cardinalities of the state and action spaces
arXiv Detail & Related papers (2023-05-15T05:37:32Z) - Projection-free Online Exp-concave Optimization [21.30065439295409]
We present an LOO-based ONS-style algorithm, which using overall $O(T)$ calls to a LOO, guarantees in worst case regret bounded by $widetildeO(n2/3T2/3)$.
Our algorithm is most interesting in an important and plausible low-dimensional data scenario.
arXiv Detail & Related papers (2023-02-09T18:58:05Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model.
We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z) - Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems.
We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Online Learning with Vector Costs and Bandits with Knapsacks [8.340947016086739]
We introduce online learning with vector costs (OLVCp) where in each time step $t in 1,ldots, T$, we need to play an action that incurs an unknown vector cost.
The goal of the online algorithm is to minimize the $ell_p$ norm of the sum of its cost vectors.
arXiv Detail & Related papers (2020-10-14T18:28:41Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.