Related papers: Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming

Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming

URL: http://arxiv.org/abs/2501.02761v1
Date: Mon, 06 Jan 2025 04:57:44 GMT
Title: Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming
Authors: Wenzhi Gao, Dongdong Ge, Chenyu Xue, Chunlin Sun, Yinyu Ye,
Abstract summary: This paper establishes a general framework that improves upon the $mathcalO ( sqrtT )$ result when the LP dual problem exhibits certain error bound conditions.<n>We show that first-order learning algorithms achieve $o( sqrtT )$ regret in the continuous support setting and $mathcalO (log T)$ regret in the finite support setting beyond the non-degeneracy assumption.
Score: 7.518108920887426
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O} ( \sqrt{T} )$, which is suboptimal compared to the $\mathcal{O} (\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes a general framework that improves upon the $\mathcal{O} ( \sqrt{T} )$ result when the LP dual problem exhibits certain error bound conditions. For the first time, we show that first-order learning algorithms achieve $o( \sqrt{T} )$ regret in the continuous support setting and $\mathcal{O} (\log T)$ regret in the finite support setting beyond the non-degeneracy assumption. Our results significantly improve the state-of-the-art regret results and provide new insights for sequential decision-making.

Related papers

Infrequent Resolving Algorithm for Online Linear Programming [3.247415064064425]
Existing online linear programming (OLP) algorithms fall into two categories: LP-based algorithms and LP-free algorithms.<n>We propose a well-performing algorithm, that solves LPs at a few selected time points and conducts first-order computations at other time points.<n>Our work highlights the value of resolving both at the beginning and the end of the selling horizon, and provides a novel framework to prove the performance guarantee of the proposed policy.
arXiv Detail & Related papers (2024-08-01T11:09:01Z)
Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods [7.518108920887426]
We introduce a new algorithmic framework that decouples learning from decision-making. For the first time, we show that first-order methods can attain regret $mathcalO(sqrtT)$ with this new framework.
arXiv Detail & Related papers (2024-02-11T05:35:50Z)
Efficient Methods for Non-stationary Online Learning [61.63338724659592]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$. We also study an even strengthened measure, namely the interval dynamic regret'', and reduce the number of projections per round from $mathcalO(log2 T)$ to $1$.
arXiv Detail & Related papers (2023-09-16T07:30:12Z)
Non-stationary Projection-free Online Learning with Dynamic and Adaptive Regret Guarantees [36.746745619968024]
We investigate non-stationary projection-free online learning, and choose dynamic regret and adaptive regret to measure the performance. Our results are the first general-case dynamic regret bounds for projection-free online learning, and can recover the existing $mathcalO(T3/4)$ static regret. We propose a projection-free method to attain an $tildemathcalO(tau3/4)$ adaptive regret bound for any interval with length $tau.
arXiv Detail & Related papers (2023-05-19T15:02:10Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes. This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z)
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z)
The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL) We focus on learning with general function classes and general model classes. We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z)
PDE-Based Optimal Strategy for Unconstrained Online Learning [40.61498562988079]
We present a framework that generates time-varying potential functions by solving a Partial Differential Equation (PDE) Our framework recovers some classical potentials, and more importantly provides a systematic approach to design new ones. This is the first parameter-free algorithm with optimal leading constant.
arXiv Detail & Related papers (2022-01-19T22:21:21Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Online Linear Optimization with Many Hints [72.4277628722419]
We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision. In this setting, we devise an algorithm that obtains logarithmic regret whenever there exists a convex combination of the $K$ hints that has positive correlation with the cost vectors.
arXiv Detail & Related papers (2020-10-06T23:25:05Z)
Online Learning with Imperfect Hints [72.4277628722419]
We develop algorithms and nearly matching lower bounds for online learning with imperfect directional hints. Our algorithms are oblivious to the quality of the hints, and the regret bounds interpolate between the always-correlated hints case and the no-hints case.
arXiv Detail & Related papers (2020-02-11T23:06:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.