Related papers: Bi-Level Online Provisioning and Scheduling with Switching Costs and Cross-Level Constraints

Bi-Level Online Provisioning and Scheduling with Switching Costs and Cross-Level Constraints

URL: http://arxiv.org/abs/2601.18936v1
Date: Mon, 26 Jan 2026 20:16:13 GMT
Title: Bi-Level Online Provisioning and Scheduling with Switching Costs and Cross-Level Constraints
Authors: Jialei Liu, C. Emre Koksal, Ming Shi,
Abstract summary: We study a bi-level online provisioning and scheduling problem motivated by network resource allocation.<n>We model this two-time-scale interaction using an upper-level online convex optimization problem and a lower-level constrained Markov decision process.
Score: 1.639795325203038
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study a bi-level online provisioning and scheduling problem motivated by network resource allocation, where provisioning decisions are made at a slow time scale while queue-/state-dependent scheduling is performed at a fast time scale. We model this two-time-scale interaction using an upper-level online convex optimization (OCO) problem and a lower-level constrained Markov decision process (CMDP). Existing OCO typically assumes stateless decisions and thus cannot capture MDP network dynamics such as queue evolution. Meanwhile, CMDP algorithms typically assume a fixed constraint threshold, whereas in provisioning-and-scheduling systems, the threshold varies with online budget decisions. To address these gaps, we study bi-level OCO-CMDP learning under switching costs (budget reprovisioning/system reconfiguration) and cross-level constraints that couple budgets to scheduling decisions. Our new algorithm solves this learning problem via several non-trivial developments, including a carefully designed dual feedback that returns the budget multiplier as sensitivity information for the upper-level update and a lower level that solves a budget-adaptive safe exploration problem via an extended occupancy-measure linear program. We establish near-optimal regret and high-probability satisfaction of the cross-level constraints.

Related papers

Distributed Online Convex Optimization with Nonseparable Costs and Constraints [7.671875264854638]
We study a group of agents, networked via a communication graph, that collectively select actions to minimize a sequence of nonseparable global cost functions.<n>We propose a distributed online primal-dual belief consensus algorithm, where each agent maintains and updates a local belief of the global collective decisions.
arXiv Detail & Related papers (2026-02-11T02:46:53Z)
iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment Problems [30.109981943437006]
Scheduling precedence-constrained tasks under shared renewable resources is central to modern computing platforms.<n>We present iScheduler, a reinforcement-learning-driven iterative scheduling framework.<n>Experiments show that iScheduler attains competitive resource costs while reducing time to feasibility by up to 43$times$ against strong commercial baselines.
arXiv Detail & Related papers (2026-01-30T11:20:58Z)
Cost Minimization for Space-Air-Ground Integrated Multi-Access Edge Computing Systems [60.586531406445744]
Space-air-ground integrated multi-altitude edge computing (SAGIN-MEC) provides a promising solution for the rapidly developing low-altitude economy.<n>We present a SAGIN-MEC architecture that enables the coordination between user devices (UDs), uncrewed aerial vehicles (UAVs) and satellites.
arXiv Detail & Related papers (2025-10-24T15:03:07Z)
No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need! [56.80767500991973]
We focus on two canonical settings: $(i)$ online resource allocation where rewards and costs are observed before action selection, and $(ii)$ online learning with resource constraints where they are observed after action selection, under full feedback or bandit feedback.<n>It is well known that achieving sublinear regret in these settings is impossible when reward and cost distributions may change arbitrarily over time.<n>We design general (primal-)dual methods that achieve sublinear regret with respect to baselines that follow the spending plan. Crucially, the performance of our algorithms improves when the spending plan ensures a well-balanced distribution of the budget
arXiv Detail & Related papers (2025-06-16T08:42:31Z)
Beamforming and Resource Allocation for Delay Minimization in RIS-Assisted OFDM Systems [38.71413228444903]
This paper investigates a joint beamforming and resource allocation problem in downlink reconfigurable intelligent surface (RIS)-assisted OFDM systems.<n>To effectively handle the mixed action space and reduce the state space dimensionality, a hybrid deep reinforcement learning (DRL) approach is proposed.<n>The proposed algorithm significantly reduces the average delay, enhances resource allocation efficiency, and achieves superior system robustness and fairness.
arXiv Detail & Related papers (2025-06-04T05:33:33Z)
Learning for Cross-Layer Resource Allocation in MEC-Aided Cell-Free Networks [71.30914500714262]
Cross-layer resource allocation over mobile edge computing (MEC)-aided cell-free networks can sufficiently exploit the transmitting and computing resources to promote the data rate.<n>Joint subcarrier allocation and beamforming optimization are investigated for the MEC-aided cell-free network from the perspective of deep learning.
arXiv Detail & Related papers (2024-12-21T10:18:55Z)
A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints [66.61399765513383]
We develop a BLOCC algorithm to tackle BiLevel Optimization problems with Coupled Constraints. We demonstrate its effectiveness on two well-known real-world applications.
arXiv Detail & Related papers (2024-06-14T15:59:36Z)
Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality [55.06411438416805]
Constrained Markov Decision Processes (CMDPs) are critical in many high-stakes applications.<n>This paper introduces a novel approach, Two-Stage Deep Decision Rules (TS- DDR) to efficiently train parametric actor policies.<n>It is shown to enhance solution quality and to reduce computation times by several orders of magnitude when compared to current state-of-the-art methods.
arXiv Detail & Related papers (2024-05-23T18:19:47Z)
Joint Service Caching, Communication and Computing Resource Allocation in Collaborative MEC Systems: A DRL-based Two-timescale Approach [15.16859210403316]
Meeting the strict Quality of Service (QoS) requirements of terminals has imposed a challenge on Multiaccess Edge Computing (MEC) systems. We propose a collaborative framework that facilitates resource sharing between the edge servers. We show that our proposed algorithm outperforms the baseline algorithms in terms of the average switching and cache cost.
arXiv Detail & Related papers (2023-07-19T00:27:49Z)
Learning-Assisted Algorithm Unrolling for Online Optimization with Budget Constraints [27.84415856657607]
We propose a new machine learning (ML) assisted unrolling approach, called LAAU (Learning-Assisted Algorithm Unrolling) For efficient training via backpropagation, we derive gradients of the decision pipeline over time. We also provide the average cost bounds for two cases when training data is available offline and collected online, respectively.
arXiv Detail & Related papers (2022-12-03T20:56:29Z)
Hierarchical Constrained Stochastic Shortest Path Planning via Cost Budget Allocation [16.150627252426936]
We propose a hierarchical constrained shortest path problem (HC-SSP) that meets those two crucial requirements in a single framework. The resulting problem has high complexity and makes it difficult to find an optimal solution fast. We present an algorithm that iteratively allocates cost budget to lower level planning problems based on branch-and-bound scheme to find a feasible solution fast and incrementally update the incumbent solution.
arXiv Detail & Related papers (2022-05-11T01:25:38Z)
Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions. In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments. We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.