Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs: A Meta-RL Approach
- URL: http://arxiv.org/abs/2410.21249v2
- Date: Fri, 26 Sep 2025 16:27:30 GMT
- Title: Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs: A Meta-RL Approach
- Authors: Manav Vora, Ilan Shomorony, Melkior Ornik,
- Abstract summary: We study capacity- and budget-constrained multi-agent MDPs (CB-MA-MDPs)<n>We propose a two-stage solution that remains tractable for large systems.
- Score: 12.116400321124594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study capacity- and budget-constrained multi-agent MDPs (CB-MA-MDPs), a class that captures many maintenance and scheduling tasks in which each agent can irreversibly fail and a planner must decide (i) when to apply a restorative action and (ii) which subset of agents to treat in parallel. The global budget limits the total number of restorations, while the capacity constraint bounds the number of simultaneous actions, turning na\"ive dynamic programming into a combinatorial search that scales exponentially with the number of agents. We propose a two-stage solution that remains tractable for large systems. First, a Linear Sum Assignment Problem (LSAP)-based grouping partitions the agents into r disjoint sets (r = capacity) that maximise diversity in expected time-to-failure, allocating budget to each set proportionally. Second, a meta-trained PPO policy solves each sub-MDP, leveraging transfer across groups to converge rapidly. To validate our approach, we apply it to the problem of scheduling repairs for a large team of industrial robots, constrained by a limited number of repair technicians and a total repair budget. Our results demonstrate that the proposed method outperforms baseline approaches in terms of maximizing the average uptime of the robot team, particularly for large team sizes. Lastly, we confirm the scalability of our approach through a computational complexity analysis across varying numbers of robots and repair technicians.
Related papers
- MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning [68.91090643731987]
Deep reinforcement learning (RL) has been applied extensively to solve complex decision-making problems.<n>Existing approaches are limited to separate fields and can only handle multi-agent decision-making with a single objective.<n>We propose MO-mix to solve the multi-objective multi-agent reinforcement learning (MOMARL) problem.
arXiv Detail & Related papers (2026-02-28T16:25:22Z) - Optimized Agent Shift Scheduling Using Multi-Phase Allocation Approach [0.0]
We present a multi-phase allocation method that addresses scalability and accuracy by dividing the problem into smaller sub-problems of day and shift allocation.<n>We then apply the proposed method, using a multi-objective framework, to address the difficulties posed by peak demand scenarios such as holiday rushes.
arXiv Detail & Related papers (2025-11-27T17:10:59Z) - BAMAS: Structuring Budget-Aware Multi-Agent Systems [18.99441110805831]
Large language model (LLM)-based multi-agent systems have emerged as a powerful paradigm for enabling autonomous agents to solve complex tasks.<n>We propose BAMAS, a novel approach for building multi-agent systems with budget awareness.<n>Results show that BAMAS achieves comparable performance while reducing cost by up to 86%.
arXiv Detail & Related papers (2025-11-26T16:48:18Z) - Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z) - Efficient Solving of Large Single Input Superstate Decomposable Markovian Decision Process [1.17431678544333]
A key step in Bellman dynamic programming algorithms is the policy evaluation.<n>We develop an exact and efficient policy evaluation method based on this structure.<n>This yields a scalable solution applicable to both average and discounted reward MDPs.
arXiv Detail & Related papers (2025-08-01T17:49:27Z) - Reinforcement learning with combinatorial actions for coupled restless bandits [62.89013331120493]
We propose SEQUOIA, an RL algorithm that directly optimize for long-term reward over the feasible action space.
We empirically validate SEQUOIA on four novel restless bandit problems with constraints: multiple interventions, path constraints, bipartite matching, and capacity constraints.
arXiv Detail & Related papers (2025-03-01T21:25:21Z) - Agent-Oriented Planning in Multi-Agent Systems [54.429028104022066]
We propose AOP, a novel framework for agent-oriented planning in multi-agent systems.<n>In this study, we identify three critical design principles of agent-oriented planning, including solvability, completeness, and non-redundancy.<n> Extensive experiments demonstrate the advancement of AOP in solving real-world problems compared to both single-agent systems and existing planning strategies for multi-agent systems.
arXiv Detail & Related papers (2024-10-03T04:07:51Z) - Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement Learning [1.1470070927586018]
This paper considers the problem of solving budget-constrained multi-component monotonic POMDPs.
For a large number of components, solving such a POMDP using current methods is computationally intractable.
We introduce an oracle-guided meta-trained Proximal Policy Optimization (PPO) algorithm to solve each of the independent budget-constrained single-component monotonic POMDPs.
arXiv Detail & Related papers (2024-08-13T20:20:58Z) - MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.
We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - M-HOF-Opt: Multi-Objective Hierarchical Output Feedback Optimization via Multiplier Induced Loss Landscape Scheduling [4.499391876093543]
We address the online choice of weight multipliers for multi-objective optimization of many loss terms parameterized by neural works.
Our method is multiplier-free and operates at the timescale of epochs.
It also circumvents the excessive memory requirements and heavy computational burden of existing multi-objective deep learning methods.
arXiv Detail & Related papers (2024-03-20T16:38:26Z) - Non-stationary Reinforcement Learning under General Function
Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs.
Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA.
We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Welfare Maximization Algorithm for Solving Budget-Constrained
Multi-Component POMDPs [2.007262412327553]
This paper presents an algorithm to find the optimal policy for a multi-component budget-constrained POMDP.
We show that the proposed algorithm vastly outperforms the policy currently used in practice.
arXiv Detail & Related papers (2023-03-18T01:43:47Z) - Linear programming-based solution methods for constrained POMDPs [0.5156484100374059]
Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various real-world phenomena.
We use grid-based approximations in combination with linear programming (LP) models to generate approximate policies for CPOMDPs.
arXiv Detail & Related papers (2022-06-28T15:22:24Z) - Under-Approximating Expected Total Rewards in POMDPs [68.8204255655161]
We consider the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP)
We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well.
arXiv Detail & Related papers (2022-01-21T16:43:03Z) - Risk-Averse Decision Making Under Uncertainty [18.467950783426947]
A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs)
In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures.
arXiv Detail & Related papers (2021-09-09T07:52:35Z) - A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs [117.82903457289584]
We derive a novel problem-dependent lower-bound for regret in finite-horizon Markov Decision Processes (MDPs)
We show that our lower-bound is considerably smaller than in the general case and it does not scale with the minimum action gap at all.
We show that this last result is attainable (up to $poly(H)$ terms, where $H$ is the horizon) by providing a regret upper-bound based on policy gaps for an optimistic algorithm.
arXiv Detail & Related papers (2021-06-24T13:46:09Z) - Multiagent Rollout and Policy Iteration for POMDP with Application to
Multi-Robot Repair Problems [1.6939372704265414]
We consider infinite horizon discounted dynamic programming problems with finite state and control spaces, partial state observations, and a multiagent structure.
Our methods specifically address the computational challenges of partially observable multiagent problems.
arXiv Detail & Related papers (2020-11-09T06:51:50Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.