Online Learning in the Random Order Model
- URL: http://arxiv.org/abs/2510.02820v1
- Date: Fri, 03 Oct 2025 08:53:35 GMT
- Title: Online Learning in the Random Order Model
- Authors: Martino Bernasconi, Andrea Celli, Riccardo Colini-Baldeschi, Federico Fusco, Stefano Leonardi, Matteo Russo,
- Abstract summary: Random-order input can exhibit significant em non-stationarity, which can hinder the performance of learning algorithms.<n>We propose a general template to adapt learning algorithms to the random-order model without substantially affecting their regret guarantees.<n>We also investigate online classification and prove that, in random order, learnability is characterized by the VC dimension rather than the Littlestone dimension.
- Score: 23.441342985145297
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the random-order model for online learning, the sequence of losses is chosen upfront by an adversary and presented to the learner after a random permutation. Any random-order input is \emph{asymptotically} equivalent to a stochastic i.i.d. one, but, for finite times, it may exhibit significant {\em non-stationarity}, which can hinder the performance of stochastic learning algorithms. While algorithms for adversarial inputs naturally maintain their regret guarantees in random order, simple no-regret algorithms exist for the stochastic model that fail against random-order instances. In this paper, we propose a general template to adapt stochastic learning algorithms to the random-order model without substantially affecting their regret guarantees. This allows us to recover improved regret bounds for prediction with delays, online learning with constraints, and bandits with switching costs. Finally, we investigate online classification and prove that, in random order, learnability is characterized by the VC dimension rather than the Littlestone dimension, thus providing a further separation from the general adversarial model.
Related papers
- Distribution-Free Sequential Prediction with Abstentions [7.110627876507878]
We study a sequential prediction problem in which an adversary is allowed to inject arbitrarily many adversarial instances in a stream of i.i.d. instances.<n>At each round, the learner may also emphabstain from making a prediction without incurring any penalty if the instance was indeed corrupted.<n>We propose an textscAbstainBoost based on a boosting procedure of weak learners, which guarantees sublinear error for general VC classes.
arXiv Detail & Related papers (2026-02-20T00:28:27Z) - Semi-Bandit Learning for Monotone Stochastic Optimization [16.921694787482213]
A generic online learning algorithm is developed for a class of ''monotone'' problems.<n>Our framework applies to several fundamental problems such as prophet inequality, Pandora's box, single-resource revenue management and posted pricing.
arXiv Detail & Related papers (2023-12-24T07:46:37Z) - Constrained Online Two-stage Stochastic Optimization: Near Optimal Algorithms via Adversarial Learning [1.994307489466967]
We consider an online two-stage optimization with long-term constraints over a finite horizon of $T$ periods.
We develop online algorithms for the online two-stage problem from adversarial learning algorithms.
arXiv Detail & Related papers (2023-02-02T10:33:09Z) - Arbitrarily Large Labelled Random Satisfiability Formulas for Machine
Learning Training [5.414308305392762]
We show how to generate correctly labeled random formulas of any desired size without having to solve the underlying decision problem.
We train existing state-of-the-art models for the task of predicting satisfiability on formulas with 10,000 variables.
We find that they do no better than random guessing 99% on the same datasets.
arXiv Detail & Related papers (2022-11-21T17:52:13Z) - A Non-monotonic Self-terminating Language Model [62.93465126911921]
In this paper, we focus on the problem of non-terminating sequences resulting from an incomplete decoding algorithm.
We first define an incomplete probable decoding algorithm which includes greedy search, top-$k$ sampling, and nucleus sampling.
We then propose a non-monotonic self-terminating language model, which relaxes the constraint of monotonically increasing termination probability.
arXiv Detail & Related papers (2022-10-03T00:28:44Z) - Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates.
We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor.
Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z) - Optimal Online Generalized Linear Regression with Stochastic Noise and
Its Application to Heteroscedastic Bandits [88.6139446295537]
We study the problem of online generalized linear regression in the setting of a generalized linear model with possibly unbounded additive noise.
We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise.
We propose an algorithm based on FTRL to achieve the first variance-aware regret bound.
arXiv Detail & Related papers (2022-02-28T08:25:26Z) - Discovering Non-monotonic Autoregressive Orderings with Variational
Inference [67.27561153666211]
We develop an unsupervised parallelizable learner that discovers high-quality generation orders purely from training data.
We implement the encoder as a Transformer with non-causal attention that outputs permutations in one forward pass.
Empirical results in language modeling tasks demonstrate that our method is context-aware and discovers orderings that are competitive with or even better than fixed orders.
arXiv Detail & Related papers (2021-10-27T16:08:09Z) - Model Selection for Generic Contextual Bandits [20.207989166682832]
We propose a refinement based algorithm called Adaptive Contextual Bandit (ttfamily ACB)
We prove that this algorithm is adaptive, i.e., the regret rate order-wise matches that of any provable contextual bandit algorithm.
We also show that a much simpler explore-then-commit (ETC) style algorithm also obtains similar regret bound, despite not knowing the true model class.
arXiv Detail & Related papers (2021-07-07T19:35:31Z) - Efficient First-Order Contextual Bandits: Prediction, Allocation, and
Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise.
First-order guarantees are relatively well understood in statistical and online learning.
We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z) - Towards Costless Model Selection in Contextual Bandits: A Bias-Variance
Perspective [7.318831153179727]
We study the feasibility of similar guarantees for cumulative regret minimization in the contextual bandit setting.
Our algorithm is based on a novel misspecification test, and our analysis demonstrates the benefits of using model selection for reward estimation.
arXiv Detail & Related papers (2021-06-11T16:08:03Z) - Outlier-Robust Learning of Ising Models Under Dobrushin's Condition [57.89518300699042]
We study the problem of learning Ising models satisfying Dobrushin's condition in the outlier-robust setting where a constant fraction of the samples are adversarially corrupted.
Our main result is to provide the first computationally efficient robust learning algorithm for this problem with near-optimal error guarantees.
arXiv Detail & Related papers (2021-02-03T18:00:57Z) - Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.