Related papers: Infrequent Resolving Algorithm for Online Linear Programming

Infrequent Resolving Algorithm for Online Linear Programming

URL: http://arxiv.org/abs/2408.00465v5
Date: Fri, 31 Jan 2025 10:13:48 GMT
Title: Infrequent Resolving Algorithm for Online Linear Programming
Authors: Guokai Li, Zizhuo Wang, Jingwei Zhang,
Abstract summary: Existing online linear programming (OLP) algorithms fall into two categories: LP-based algorithms and LP-free algorithms.<n>We propose a well-performing algorithm, that solves LPs at a few selected time points and conducts first-order computations at other time points.<n>Our work highlights the value of resolving both at the beginning and the end of the selling horizon, and provides a novel framework to prove the performance guarantee.
Score: 3.247415064064425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management, order fulfillment and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance, even offering a constant regret, but requires solving a large number of LPs, which could be computationally expensive. In contrast, LP-free algorithm only requires first-order computations but induces a worse performance, lacking a constant regret bound. In this work, we bridge the gap between these two extremes by proposing a well-performing algorithm, that solves LPs at a few selected time points and conducts first-order computations at other time points. Specifically, for the case where the inputs are drawn from an unknown finite-support distribution, the proposed algorithm achieves a constant regret (even for the hard "degenerate" case) while solving LPs only $\mathcal{O}(\log\log T)$ times over the time horizon $T$. Moreover, when we are allowed to solve LPs only $M$ times, we design the corresponding schedule such that the proposed algorithm can guarantee a nearly $\mathcal{O}\left(T^{(1/2)^{M-1}}\right)$ regret. Our work highlights the value of resolving both at the beginning and the end of the selling horizon, and provides a novel framework to prove the performance guarantee of the proposed policy under different infrequent resolving schedules. Furthermore, when the arrival probabilities are known at the beginning, our algorithm can guarantee a constant regret by solving LPs $\mathcal{O}(\log\log T)$ times, and a nearly $\mathcal{O}\left(T^{(1/2)^{M}}\right)$ regret by solving LPs only $M$ times. Numerical experiments are conducted to demonstrate the efficiency of the proposed algorithms.

Related papers

Beyond $\mathcal{O}(\sqrt{T})$ Regret: Decoupling Learning and Decision-making in Online Linear Programming [7.518108920887426]
This paper establishes a general framework that improves upon the $mathcalO ( sqrtT )$ result when the LP dual problem exhibits certain error bound conditions. We show that first-order learning algorithms achieve $o( sqrtT )$ regret in the continuous support setting and $mathcalO (log T)$ regret in the finite support setting beyond the non-degeneracy assumption.
arXiv Detail & Related papers (2025-01-06T04:57:44Z)
Wait-Less Offline Tuning and Re-solving for Online Decision Making [10.984414655748095]
We propose a new algorithm that combines the strengths of LP-based and first-order OLP methods. Our algorithm achieves $mathscrO(log (T/f) + sqrtf)$ regret, delivering a "wait-less" online decision-making process.
arXiv Detail & Related papers (2024-12-12T18:58:14Z)
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two principled algorithms for the test-time compute of large language models. We prove theoretically that the failure probability of one algorithm decays to zero exponentially as its test-time compute grows.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
Online Learning Quantum States with the Logarithmic Loss via VB-FTRL [1.8856444568755568]
Online learning of quantum states with the logarithmic loss (LL-OLQS) is a classic open problem in online learning for over three decades. In this paper, we generalize VB-FTRL for LL-OLQS with moderate computational complexity. Each of the algorithm consists of a semidefinite program that can be implemented in time by, for example, cutting-plane methods.
arXiv Detail & Related papers (2023-11-06T15:45:33Z)
Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards [40.99322897009357]
We propose two novel algorithms based on truncation and mean of medians. Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches. Our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $epsilon=1$.
arXiv Detail & Related papers (2023-10-28T13:01:10Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes. This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z)
Efficient Algorithms for Planning with Participation Constraints [74.74967476995572]
We consider the problem of planning with participation constraints introduced in [Zhang et al., 2022] In this problem, a principal chooses actions in a decision process, resulting in separate utilities for the principal and the agent. We provide the first-time exact algorithm for this problem for finite-horizon settings, where previously only an additive $varepsilon$-approximation was known.
arXiv Detail & Related papers (2022-05-16T15:47:41Z)
Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints [39.715977181666766]
We study regret for infinite-horizon average-reward Markov Decision Processes (MDPs) under cost constraints. Our algorithm ensures $widetildeO(sqrtT)$ regret and constant constraint violation for ergodic MDPs. These are the first set of provable algorithms for weakly communicating MDPs with cost constraints.
arXiv Detail & Related papers (2022-01-31T23:52:34Z)
Logarithmic Regret from Sublinear Hints [76.87432703516942]
We show that an algorithm can obtain $O(log T)$ regret with just $O(sqrtT)$ hints under a natural query model. We also show that $o(sqrtT)$ hints cannot guarantee better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2021-11-09T16:50:18Z)
Choosing the Right Algorithm With Hints From Complexity Theory [16.33500498939925]
We show that the Metropolis algorithm is clearly the best of all algorithms regarded for reasonable problem sizes. An artificial algorithm of this type having an $O(n log n)$ runtime leads to the result that the significance-based compact genetic algorithm (sig-cGA) can solve the DLB problem in time $O(n log n)$ with high probability.
arXiv Detail & Related papers (2021-09-14T11:12:32Z)
Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z)
Online Linear Optimization with Many Hints [72.4277628722419]
We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision. In this setting, we devise an algorithm that obtains logarithmic regret whenever there exists a convex combination of the $K$ hints that has positive correlation with the cost vectors.
arXiv Detail & Related papers (2020-10-06T23:25:05Z)
Sparsified Linear Programming for Zero-Sum Equilibrium Finding [89.30539368124025]
We present a totally different approach to the problem, which is competitive and often orders of magnitude better than the prior state of the art. With experiments on poker endgames, we demonstrate, for the first time, that modern linear program solvers are competitive against even game-specific modern variants of CFR.
arXiv Detail & Related papers (2020-06-05T13:48:26Z)
Second-order Conditional Gradient Sliding [79.66739383117232]
We present the emphSecond-Order Conditional Gradient Sliding (SOCGS) algorithm. The SOCGS algorithm converges quadratically in primal gap after a finite number of linearly convergent iterations. It is useful when the feasible region can only be accessed efficiently through a linear optimization oracle.
arXiv Detail & Related papers (2020-02-20T17:52:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.