Related papers: Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods

Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods

URL: http://arxiv.org/abs/2402.07108v2
Date: Tue, 28 May 2024 20:43:21 GMT
Title: Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods
Authors: Wenzhi Gao, Chunlin Sun, Chenyu Xue, Dongdong Ge, Yinyu Ye,
Abstract summary: We introduce a new algorithmic framework that decouples learning from decision-making. For the first time, we show that first-order methods can attain regret $mathcalO(sqrtT)$ with this new framework.
Score: 7.518108920887426
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O}(\sqrt{T})$, which is suboptimal compared to the $\mathcal{O}(\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes several important facts about online linear programming, which unveils the challenge for first-order-method-based online algorithms to achieve beyond $\mathcal{O}(\sqrt{T})$ regret. To address the challenge, we introduce a new algorithmic framework that decouples learning from decision-making. For the first time, we show that first-order methods can attain regret $\mathcal{O}(T^{1/3})$ with this new framework.

Related papers

Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming [7.518108920887426]
This paper establishes a general framework that improves upon the $mathcalO ( sqrtT )$ result when the LP dual problem exhibits certain error bound conditions. We show that first-order learning algorithms achieve $o( sqrtT )$ regret in the continuous support setting and $mathcalO (log T)$ regret in the finite support setting beyond the non-degeneracy assumption.
arXiv Detail & Related papers (2025-01-06T04:57:44Z)
Wait-Less Offline Tuning and Re-solving for Online Decision Making [10.984414655748095]
We propose a new algorithm that combines the strengths of LP-based and first-order OLP methods. Our algorithm achieves $mathscrO(log (T/f) + sqrtf)$ regret, delivering a "wait-less" online decision-making process.
arXiv Detail & Related papers (2024-12-12T18:58:14Z)
Infrequent Resolving Algorithm for Online Linear Programming [3.247415064064425]
Existing online linear programming (OLP) algorithms fall into two categories: LP-based algorithms and LP-free algorithms. We propose an algorithm that achieves a constant regret while solving LPs only $O(loglog T)$ times over the time horizon $T$. Our algorithm can guarantee a constant regret by solving LPs $O(loglog T)$ times, and an $Oleft(T(1/2+epsilon)Mright)$ regret by solving LPs only $M$ times.
arXiv Detail & Related papers (2024-08-01T11:09:01Z)
Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes. This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z)
PDE-Based Optimal Strategy for Unconstrained Online Learning [40.61498562988079]
We present a framework that generates time-varying potential functions by solving a Partial Differential Equation (PDE) Our framework recovers some classical potentials, and more importantly provides a systematic approach to design new ones. This is the first parameter-free algorithm with optimal leading constant.
arXiv Detail & Related papers (2022-01-19T22:21:21Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Efficient Online-Bandit Strategies for Minimax Learning Problems [21.300877551771197]
Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or minimization with non-standard aggregated losses. More specifically, these problems are convex-linear problems where the learning is carried out over the model parameters $winmathcalW$ and over the empirical distribution $pinmathcalK$ of the training set. To design efficient methods, we let an online learning algorithm play against a (combinatorial) bandit algorithm.
arXiv Detail & Related papers (2021-05-28T16:01:42Z)
Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z)
Strongly Adaptive OCO with Memory [49.319621885036035]
We propose the first strongly adaptive algorithm for online learning with memory. Our algorithm results in a strongly adaptive regret bound for the control of linear time-varying systems.
arXiv Detail & Related papers (2021-02-02T17:26:08Z)
Online Linear Optimization with Many Hints [72.4277628722419]
We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision. In this setting, we devise an algorithm that obtains logarithmic regret whenever there exists a convex combination of the $K$ hints that has positive correlation with the cost vectors.
arXiv Detail & Related papers (2020-10-06T23:25:05Z)
Online Learning with Imperfect Hints [72.4277628722419]
We develop algorithms and nearly matching lower bounds for online learning with imperfect directional hints. Our algorithms are oblivious to the quality of the hints, and the regret bounds interpolate between the always-correlated hints case and the no-hints case.
arXiv Detail & Related papers (2020-02-11T23:06:09Z)
Faster Projection-free Online Learning [34.96927532439896]
We give an efficient projection-free algorithm that guarantees $T2/3$ regret for general online convex optimization. Our algorithm is derived using the Follow-the-Perturbed-Leader method and is analyzed using an online primal-dual framework.
arXiv Detail & Related papers (2020-01-30T21:18:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.