Related papers: Oracle-Efficient Online Learning for Beyond Worst-Case Adversaries

Oracle-Efficient Online Learning for Beyond Worst-Case Adversaries

URL: http://arxiv.org/abs/2202.08549v1
Date: Thu, 17 Feb 2022 09:49:40 GMT
Title: Oracle-Efficient Online Learning for Beyond Worst-Case Adversaries
Authors: Nika Haghtalab, Yanjun Han, Abhishek Shetty, Kunhe Yang
Abstract summary: We study oracle-efficient algorithms for beyond worst-case analysis of online learning. For the smoothed analysis setting, our results give the first oracle-efficient algorithm for online learning with smoothed adversaries.
Score: 29.598518028635716
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we study oracle-efficient algorithms for beyond worst-case analysis of online learning. We focus on two settings. First, the smoothed analysis setting of [RST11, HRS12] where an adversary is constrained to generating samples from distributions whose density is upper bounded by $1/\sigma$ times the uniform density. Second, the setting of $K$-hint transductive learning, where the learner is given access to $K$ hints per time step that are guaranteed to include the true instance. We give the first known oracle-efficient algorithms for both settings that depend only on the VC dimension of the class and parameters $\sigma$ and $K$ that capture the power of the adversary. In particular, we achieve oracle-efficient regret bounds of $ O ( \sqrt{T (d / \sigma )^{1/2} } ) $ and $ O ( \sqrt{T d K } )$ respectively for these setting. For the smoothed analysis setting, our results give the first oracle-efficient algorithm for online learning with smoothed adversaries [HRS21]. This contrasts the computational separation between online learning with worst-case adversaries and offline learning established by [HK16]. Our algorithms also imply improved bounds for worst-case setting with small domains. In particular, we give an oracle-efficient algorithm with regret of $O ( \sqrt{T(d \vert{\mathcal{X}}\vert ) ^{1/2} })$, which is a refinement of the earlier $O ( \sqrt{T\vert{\mathcal{X} } \vert })$ bound by [DS16].

Related papers

Learning and Computation of $Φ$-Equilibria at the Frontier of Tractability [85.07238533644636]
$Phi$-equilibria is a powerful and flexible framework at the heart of online learning and game theory. We show that an efficient online algorithm incurs average $Phi$-regret at most $epsilon$ using $textpoly(d, k)/epsilon2$ rounds. We also show nearly matching lower bounds in the online setting, thereby obtaining for the first time a family of deviations that captures the learnability of $Phi$-regret.
arXiv Detail & Related papers (2025-02-25T19:08:26Z)
Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming [7.518108920887426]
This paper establishes a general framework that improves upon the $mathcalO ( sqrtT )$ result when the LP dual problem exhibits certain error bound conditions. We show that first-order learning algorithms achieve $o( sqrtT )$ regret in the continuous support setting and $mathcalO (log T)$ regret in the finite support setting beyond the non-degeneracy assumption.
arXiv Detail & Related papers (2025-01-06T04:57:44Z)
Upper and Lower Bounds for Distributionally Robust Off-Dynamics Reinforcement Learning [6.236688509180343]
We study off-dynamics Reinforcement Learning (RL), where the policy training and deployment environments are different. We propose a novel algorithm We-DRIVE-U that enjoys an average suboptimality $widetildemathcalObig(d H cdot min 1/rho, H/sqrtK big)$. We also construct a novel hard instance and derive the first information-theoretic lower bound in this setting.
arXiv Detail & Related papers (2024-09-30T17:21:15Z)
Oracle-Efficient Hybrid Online Learning with Unknown Distribution [16.73555330791045]
We show that there exists a computationally efficient online predictor that a regret upper bounded by $tildeO(Tfrac34)$ for a finite-VC class, and upper bounded by $tildeO(Tfracp+1p+2)$ for a class with $alpha$ fat-shattering dimension. We then extend our result to the scenario of shifting distributions with $K$ changes, yielding a regret of order $tildeO(Tfrac45Kfrac15
arXiv Detail & Related papers (2024-01-27T22:45:02Z)
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits. Our algorithms deliver near-optimal regret bounds in both the adversarial and adversarial regimes.
arXiv Detail & Related papers (2023-12-24T08:27:30Z)
Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
We show that our algorithm achieves an $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ are the cardinalities of the state and action spaces
arXiv Detail & Related papers (2023-05-15T05:37:32Z)
Projection-free Online Exp-concave Optimization [21.30065439295409]
We present an LOO-based ONS-style algorithm, which using overall $O(T)$ calls to a LOO, guarantees in worst case regret bounded by $widetildeO(n2/3T2/3)$. Our algorithm is most interesting in an important and plausible low-dimensional data scenario.
arXiv Detail & Related papers (2023-02-09T18:58:05Z)
Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon. Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems. We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
Online Learning with Vector Costs and Bandits with Knapsacks [8.340947016086739]
We introduce online learning with vector costs (OLVCp) where in each time step $t in 1,ldots, T$, we need to play an action that incurs an unknown vector cost. The goal of the online algorithm is to minimize the $ell_p$ norm of the sum of its cost vectors.
arXiv Detail & Related papers (2020-10-14T18:28:41Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors. We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.