Related papers: Wait-Less Offline Tuning and Re-solving for Online Decision Making

Wait-Less Offline Tuning and Re-solving for Online Decision Making

URL: http://arxiv.org/abs/2412.09594v2
Date: Fri, 10 Jan 2025 09:40:04 GMT
Title: Wait-Less Offline Tuning and Re-solving for Online Decision Making
Authors: Jingruo Sun, Wenzhi Gao, Ellen Vitercik, Yinyu Ye,
Abstract summary: We propose a new algorithm that combines the strengths of LP-based and first-order OLP methods.<n>Our algorithm achieves $mathscrO(log (T/f) + sqrtf)$ regret, delivering a "wait-less" online decision-making process.
Score: 10.984414655748095
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Online linear programming (OLP) has found broad applications in revenue management and resource allocation. State-of-the-art OLP algorithms achieve low regret by repeatedly solving linear programming (LP) subproblems that incorporate updated resource information. However, LP-based methods are computationally expensive and often inefficient for large-scale applications. In contrast, recent first-order OLP algorithms are more computationally efficient but typically suffer from worse regret guarantees. To address these shortcomings, we propose a new algorithm that combines the strengths of LP-based and first-order OLP methods. The algorithm re-solves the LP subproblems periodically at a predefined frequency $f$ and uses the latest dual prices to guide online decision-making. In addition, a first-order method runs in parallel during each interval between LP re-solves, smoothing resource consumption. Our algorithm achieves $\mathscr{O}(\log (T/f) + \sqrt{f})$ regret, delivering a "wait-less" online decision-making process that balances the computational efficiency of first-order methods and the superior regret guarantee of LP-based methods.

Related papers

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations [4.168629519090361]
We develop a new algorithm to solve theReinforcement learning problems with function approximation.<n>Our algorithm is based on the linear programming (LP) reformulation and it resolves the LP at each improved with new data arrival.<n>In comparison to the $O(1/sqrtN)$ worst-case guarantee established in the previous literature, our instance-dependent guarantee is tighter when the underlying instance is favorable.
arXiv Detail & Related papers (2025-05-17T14:59:15Z)
Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming [7.518108920887426]
This paper establishes a general framework that improves upon the $mathcalO ( sqrtT )$ result when the LP dual problem exhibits certain error bound conditions. We show that first-order learning algorithms achieve $o( sqrtT )$ regret in the continuous support setting and $mathcalO (log T)$ regret in the finite support setting beyond the non-degeneracy assumption.
arXiv Detail & Related papers (2025-01-06T04:57:44Z)
Infrequent Resolving Algorithm for Online Linear Programming [3.247415064064425]
Existing online linear programming (OLP) algorithms fall into two categories: LP-based algorithms and LP-free algorithms. We propose an algorithm that achieves a constant regret while solving LPs only $O(loglog T)$ times over the time horizon $T$. Our algorithm can guarantee a constant regret by solving LPs $O(loglog T)$ times, and an $Oleft(T(1/2+epsilon)Mright)$ regret by solving LPs only $M$ times.
arXiv Detail & Related papers (2024-08-01T11:09:01Z)
Truly No-Regret Learning in Constrained MDPs [61.78619476991494]
We propose a model-based primal-dual algorithm to learn in an unknown CMDP. We prove that our algorithm achieves sublinear regret without error cancellations.
arXiv Detail & Related papers (2024-02-24T09:47:46Z)
Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods [7.518108920887426]
We introduce a new algorithmic framework that decouples learning from decision-making. For the first time, we show that first-order methods can attain regret $mathcalO(sqrtT)$ with this new framework.
arXiv Detail & Related papers (2024-02-11T05:35:50Z)
Efficient Methods for Non-stationary Online Learning [61.63338724659592]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$. We also study an even strengthened measure, namely the interval dynamic regret'', and reduce the number of projections per round from $mathcalO(log2 T)$ to $1$.
arXiv Detail & Related papers (2023-09-16T07:30:12Z)
Learning to Reformulate for Linear Programming [11.628932152805724]
We propose a reinforcement learning-based reformulation method for linear programming (LP) to improve the performance of solving process. We implement the proposed method over two public research LP datasets and one large-scale LP dataset collected from practical production planning scenario.
arXiv Detail & Related papers (2022-01-17T04:58:46Z)
Nearly Optimal Linear Convergence of Stochastic Primal-Dual Methods for Linear Programming [5.126924253766052]
We show that the proposed method exhibits a linear convergence rate for solving sharp instances with a high probability. We also propose an efficient coordinate-based oracle for unconstrained bilinear problems.
arXiv Detail & Related papers (2021-11-10T04:56:38Z)
Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers [112.23981192818721]
We propose to use backward mode linear relaxation based analysis (LiRPA) to replace Linear Programming (LP) during the BaB process. Unlike LP, LiRPA when applied naively can produce much weaker bounds and even cannot check certain conflicts of sub-domains during splitting. We demonstrate an order of magnitude speedup compared to existing LP-based approaches.
arXiv Detail & Related papers (2020-11-27T16:42:12Z)
Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning. Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms. We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z)
Sparsified Linear Programming for Zero-Sum Equilibrium Finding [89.30539368124025]
We present a totally different approach to the problem, which is competitive and often orders of magnitude better than the prior state of the art. With experiments on poker endgames, we demonstrate, for the first time, that modern linear program solvers are competitive against even game-specific modern variants of CFR.
arXiv Detail & Related papers (2020-06-05T13:48:26Z)
Determinantal Point Processes in Randomized Numerical Linear Algebra [80.27102478796613]
Numerical Linear Algebra (RandNLA) uses randomness to develop improved algorithms for matrix problems that arise in scientific computing, data science, machine learning, etc. Recent work has uncovered deep and fruitful connections between DPPs and RandNLA which lead to new guarantees and improved algorithms.
arXiv Detail & Related papers (2020-05-07T00:39:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.