Related papers: Online Algorithms with Unreliable Guidance

Online Algorithms with Unreliable Guidance

URL: http://arxiv.org/abs/2602.20706v1
Date: Tue, 24 Feb 2026 09:11:56 GMT
Title: Online Algorithms with Unreliable Guidance
Authors: Julien Dallot, Yuval Emek, Yuval Gil, Maciej Pacut, Stefan Schmid,
Abstract summary: This paper introduces a new model for ML-augmented online decision making, called online algorithms with unreliable guidance (OAG)<n>Formulated through the lens of request-answer games, an OAG algorithm receives, with each incoming request, a piece of guidance which is taken from the problem's answer space.<n>We describe a systematic method, called the drop or trust blindly (DTB) compiler, which transforms any online algorithm into a learning-augmented online algorithm in the OAG model.
Score: 6.891896330885501
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces a new model for ML-augmented online decision making, called online algorithms with unreliable guidance (OAG). This model completely separates between the predictive and algorithmic components, thus offering a single well-defined analysis framework that relies solely on the considered problem. Formulated through the lens of request-answer games, an OAG algorithm receives, with each incoming request, a piece of guidance which is taken from the problem's answer space; ideally, this guidance is the optimal answer for the current request, however with probability $β$, the guidance is adversarially corrupted. The goal is to develop OAG algorithms that admit good competitiveness when $β= 0$ (a.k.a. consistency) as well as when $β= 1$ (a.k.a. robustness); the appealing notion of smoothness, that in most prior work required a dedicated loss function, now arises naturally as $β$ shifts from $0$ to $1$. We then describe a systematic method, called the drop or trust blindly (DTB) compiler, which transforms any online algorithm into a learning-augmented online algorithm in the OAG model. Given a prediction-oblivious online algorithm, its learning-augmented counterpart produced by applying the DTB compiler either follows the incoming guidance blindly or ignores it altogether and proceeds as the initial algorithm would have; the choice between these two alternatives is based on the outcome of a (biased) coin toss. As our main technical contribution, we prove (rigorously) that although remarkably simple, the class of algorithms produced via the DTB compiler includes algorithms with attractive consistency-robustness guarantees for three classic online problems: for caching and uniform metrical task systems our algorithms are optimal, whereas for bipartite matching (with adversarial arrival order), our algorithm outperforms the state-of-the-art.

Related papers

Bandits with Single-Peaked Preferences and Limited Resources [6.205308371824035]
We study an online matching problem in which an algorithm sequentially matches $U$ users to $K$ arms.<n>Without structural assumptions, computing the optimal matching is NP-hard, making online learning computationally infeasible.<n>We devise an efficient algorithm for the offline budgeted matching problem, and leverage it into an efficient online algorithm with a regret of $tilde O(UKT2/3)$.
arXiv Detail & Related papers (2025-10-10T14:27:25Z)
Online Uniform Sampling: Randomized Learning-Augmented Approximation Algorithms with Application to Digital Health [3.534690532561709]
We study the novel problem of online uniform sampling (OUS), where the goal is to distribute a sampling budget uniformly across unknown decision times. In the OUS problem, the algorithm is given a budget $b$ and a time horizon $T$, and an adversary then chooses a value $tau* in [b,T]$, which is revealed to the algorithm online. We present the first randomized algorithm designed for this problem and subsequently extend it to incorporate learning augmentation.
arXiv Detail & Related papers (2024-02-03T02:36:59Z)
Best of Both Worlds Guarantees for Smoothed Online Quadratic Optimization [9.449153668916098]
We study the smoothed online optimization (SOQO) problem where, at each round $t$, a player plays an action $x_t in response to a quadratic hitting cost and an additional squared $ell$-norm cost for switching actions. This problem class has strong connections to a wide range of application domains including smart grid management, adaptive control, and data center management. We present a best-of-both-worlds algorithm that obtains a robust adversarial performance while simultaneously achieving a near-optimal performance.
arXiv Detail & Related papers (2023-10-31T22:59:23Z)
Efficient Methods for Non-stationary Online Learning [63.268670895111654]
We present efficient methods for optimizing dynamic regret and adaptive regret.<n>The proposed algorithms require only one gradient query and one function evaluation at each round.<n>We also study an even stronger measure, namely "interval dynamic regret", and reduce the number of projections per round from $O(log2 T)$ to $1$.
arXiv Detail & Related papers (2023-09-16T07:30:12Z)
Adversarial Deep Learning for Online Resource Allocation [12.118811903399951]
We use deep neural networks to learn an online algorithm for a resource allocation and pricing problem from scratch. Our work is the first using deep neural networks to design an online algorithm from the perspective of worst-case performance guarantee.
arXiv Detail & Related papers (2021-11-19T15:48:43Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Online Adversarial Attacks [57.448101834579624]
We formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases. We first rigorously analyze a deterministic variant of the online threat model. We then propose algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm.
arXiv Detail & Related papers (2021-03-02T20:36:04Z)
Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting. We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z)
Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online Algorithms [85.97516436641533]
We study the problem of improving the performance of online algorithms by incorporating machine-learned predictions. The goal is to design algorithms that are both consistent and robust. We provide the first set of non-trivial lower bounds for competitive analysis using machine-learned predictions.
arXiv Detail & Related papers (2020-10-22T04:51:01Z)
Run2Survive: A Decision-theoretic Approach to Algorithm Selection based on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime. We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive. In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z)
Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms. We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.