Tradeoffs between Mistakes and ERM Oracle Calls in Online and Transductive Online Learning
- URL: http://arxiv.org/abs/2506.00135v1
- Date: Fri, 30 May 2025 18:11:58 GMT
- Title: Tradeoffs between Mistakes and ERM Oracle Calls in Online and Transductive Online Learning
- Authors: Idan Attias, Steve Hanneke, Arvind Ramaswami,
- Abstract summary: We study online and transductive online learning when the learner interacts with the concept class only via Empirical Risk Minimization (ERM) or weak consistency oracles.
- Score: 17.389446817000945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online and transductive online learning when the learner interacts with the concept class only via Empirical Risk Minimization (ERM) or weak consistency oracles on arbitrary instance subsets. This contrasts with standard online models, where the learner knows the entire class. The ERM oracle returns a hypothesis minimizing loss on a given subset, while the weak consistency oracle returns a binary signal indicating whether the subset is realizable by some concept. The learner is evaluated by the number of mistakes and oracle calls. In the standard online setting with ERM access, we prove tight lower bounds in both realizable and agnostic cases: $\Omega(2^{d_{VC}})$ mistakes and $\Omega(\sqrt{T 2^{d_{LD}}})$ regret, where $T$ is the number of timesteps and $d_{LD}$ is the Littlestone dimension. We further show that existing online learning results with ERM access carry over to the weak consistency setting, incurring an additional $O(T)$ in oracle calls. We then consider the transductive online model, where the instance sequence is known but labels are revealed sequentially. For general Littlestone classes, we show that optimal realizable and agnostic mistake bounds can be achieved using $O(T^{d_{VC}+1})$ weak consistency oracle calls. On the negative side, we show that limiting the learner to $\Omega(T)$ weak consistency queries is necessary for transductive online learnability, and that restricting the learner to $\Omega(T)$ ERM queries is necessary to avoid exponential dependence on the Littlestone dimension. Finally, for certain concept classes, we reduce oracle calls via randomized algorithms while maintaining similar mistake bounds. In particular, for Thresholds on an unknown ordering, $O(\log T)$ ERM queries suffice; for $k$-Intervals, $O(T^3 2^{2k})$ weak consistency queries suffice.
Related papers
- Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning [28.184175745050474]
We study the impact of the choice of supervised learning oracle on the computational complexity of reinforcement learning algorithms.<n>First, we identify two-context regression as a minimal oracle in the standard episodic access model.<n>Second, we identify one-context regression as a near-minimal oracle in the stronger reset access model.<n>Third, we broaden our focus to Low-Rank MDPs, where we give cryptographic evidence that the analogous oracle from the Block MDP setting is insufficient.
arXiv Detail & Related papers (2025-02-12T18:47:13Z) - Computational Lower Bounds for Regret Minimization in Normal-Form Games [68.66209476382213]
We provide evidence that existing learning algorithms, such as multiplicative weights update, are close to optimal.
Our results are obtained in the algorithmic framework put forward by Kothari and Mehta.
arXiv Detail & Related papers (2024-11-04T00:39:52Z) - Understanding Aggregations of Proper Learners in Multiclass Classification [4.422219522591412]
Multiclass learnability is known to exhibit a properness barrier.
Recent advances in binary classification have demonstrated that this requirement can be satisfied using aggregations of proper learners.
We show that a single ERM requires $Omega left(fracd_G ln (1 / delta)epsilonright)$ samples.
arXiv Detail & Related papers (2024-10-30T07:12:02Z) - Oracle-Efficient Hybrid Online Learning with Unknown Distribution [16.73555330791045]
We show that there exists a computationally efficient online predictor that a regret upper bounded by $tildeO(Tfrac34)$ for a finite-VC class, and upper bounded by $tildeO(Tfracp+1p+2)$ for a class with $alpha$ fat-shattering dimension.
We then extend our result to the scenario of shifting distributions with $K$ changes, yielding a regret of order $tildeO(Tfrac45Kfrac15
arXiv Detail & Related papers (2024-01-27T22:45:02Z) - When is Agnostic Reinforcement Learning Statistically Tractable? [76.1408672715773]
A new complexity measure, called the emphspanning capacity, depends solely on the set $Pi$ and is independent of the MDP dynamics.
We show there exists a policy class $Pi$ with a bounded spanning capacity that requires a superpolynomial number of samples to learn.
This reveals a surprising separation for learnability between generative access and online access models.
arXiv Detail & Related papers (2023-10-09T19:40:54Z) - Simple online learning with consistent oracle [55.43220407902113]
We consider online learning in the model where a learning algorithm can access the class only via the emphconsistent oracle -- an oracle, that, at any moment, can give a function from the class that agrees with all examples seen so far.
arXiv Detail & Related papers (2023-08-15T21:50:40Z) - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight.
We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z) - Oracle-Efficient Online Learning for Beyond Worst-Case Adversaries [29.598518028635716]
We study oracle-efficient algorithms for beyond worst-case analysis of online learning.
For the smoothed analysis setting, our results give the first oracle-efficient algorithm for online learning with smoothed adversaries.
arXiv Detail & Related papers (2022-02-17T09:49:40Z) - Online Selective Classification with Limited Feedback [82.68009460301585]
We study selective classification in the online learning model, wherein a predictor may abstain from classifying an instance.
Two salient aspects of the setting we consider are that the data may be non-realisable, due to which abstention may be a valid long-term action.
We construct simple versioning-based schemes for any $mu in (0,1],$ that make most $Tmu$ mistakes while incurring smash$tildeO(T1-mu)$ excess abstention against adaptive adversaries.
arXiv Detail & Related papers (2021-10-27T08:00:53Z) - Online Continual Adaptation with Active Self-Training [69.5815645379945]
We propose an online setting where the learner aims to continually adapt to changing distributions using both unlabeled samples and active queries of limited labels.
Online Self-Adaptive Mirror Descent (OSAMD) adopts an online teacher-student structure to enable online self-training from unlabeled data.
We show that OSAMD achieves favorable regrets under changing environments with limited labels on both simulated and real-world data.
arXiv Detail & Related papers (2021-06-11T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.