Related papers: Auction-Based Scheduling

Auction-Based Scheduling

URL: http://arxiv.org/abs/2310.11798v2
Date: Wed, 31 Jan 2024 19:08:28 GMT
Title: Auction-Based Scheduling
Authors: Guy Avni, Kaushik Mallik, Suman Sadhukhan
Abstract summary: Auction-based scheduling is a modular framework for multi-objective decision-making problems. Each objective is fulfilled using a separate policy, and the policies can be independently created, modified, and replaced. We present decentralized algorithms to synthesize a pair of policies, their initially allocated budgets, and bidding strategies.
Score: 2.3326951882644553
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many sequential decision-making tasks require satisfaction of multiple, partially contradictory objectives. Existing approaches are monolithic, namely all objectives are fulfilled using a single policy, which is a function that selects a sequence of actions. We present auction-based scheduling, a modular framework for multi-objective decision-making problems. Each objective is fulfilled using a separate policy, and the policies can be independently created, modified, and replaced. Understandably, different policies with conflicting goals may choose conflicting actions at a given time. In order to resolve conflicts, and compose policies, we employ a novel auction-based mechanism. We allocate a bounded budget to each policy, and at each step, the policies simultaneously bid from their available budgets for the privilege of being scheduled and choosing an action. Policies express their scheduling urgency using their bids and the bounded budgets ensure long-run scheduling fairness. We lay the foundations of auction-based scheduling using path planning problems on finite graphs with two temporal objectives. We present decentralized algorithms to synthesize a pair of policies, their initially allocated budgets, and bidding strategies. We consider three categories of decentralized synthesis problems, parameterized by the assumptions that the policies make on each other: (a) strong synthesis, with no assumptions and strongest guarantees, (b) assume-admissible synthesis, with weakest rationality assumptions, and (c) assume-guarantee synthesis, with explicit contract-based assumptions. For reachability objectives, we show that, surprisingly, decentralized assume-admissible synthesis is always possible when the out-degrees of all vertices are at most two.

Related papers

Compositional Planning with Jumpy World Models [70.74595987225908]
We study agents that compose pre-trained policies as temporally extended actions, enabling solutions to complex tasks that no constituent alone can solve.<n>Motivated by the geometric policy composition framework introduced in arXiv:2206.08736, we address these challenges by learning predictive models of multi-step dynamics.
arXiv Detail & Related papers (2026-02-23T09:22:21Z)
Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty [4.5515292789901975]
Intraday surgical scheduling is a multi-objective decision problem under uncertainty-balancing throughput, urgent and emergency demand, delays, sequence-dependent setups, and overtime.<n>We formulate the problem as a cooperative Markov game and propose a multi-agent reinforcement learning framework in which each operating room is an agent trained with centralized training and decentralized execution.<n>All agents share a policy trained via Proximal Policy Optimization (PPO), which maps rich system states to actions, while a within-epoch sequential assignment protocol constructs joint schedules across ORs.
arXiv Detail & Related papers (2025-12-04T15:47:08Z)
Constrained and Robust Policy Synthesis with Satisfiability-Modulo-Probabilistic-Model-Checking [4.064849471241967]
This paper contributes the first approach to effectively compute robust policies subject to arbitrary structural constraints.<n> Experiments on a few hundred benchmarks demonstrate the feasibility for constrained and robust policy synthesis.
arXiv Detail & Related papers (2025-11-11T10:28:42Z)
Competitive Algorithms for Cooperative Multi-Agent Ski-Rental Problems [35.95355517827071]
This paper introduces a novel multi-agent ski-rental problem that generalizes the classical ski-rental dilemma to a group setting.<n>In our model, each agent can either rent at a fixed daily cost, or purchase a pass at an individual cost, with an additional third option of a discounted group pass available to all.<n>We consider scenarios in which agents' active days differ, leading to dynamic states as agents drop out of the decision process.
arXiv Detail & Related papers (2025-07-21T15:36:34Z)
Quantile-Optimal Policy Learning under Unmeasured Confounding [55.72891849926314]
We study quantile-optimal policy learning where the goal is to find a policy whose reward distribution has the largest $alpha$-quantile for some $alpha in (0, 1)$.<n>Such a problem suffers from three main challenges: (i) nonlinearity of the quantile objective as a functional of the reward distribution, (ii) unobserved confounding issue, and (iii) insufficient coverage of the offline dataset.
arXiv Detail & Related papers (2025-06-08T13:37:38Z)
Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes [59.27926064817273]
We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under domination assumptions.<n>We empirically validate both the action-based (C-PGAE) and parameter-based (C-PGPE) variants of C-PG on constrained control tasks.
arXiv Detail & Related papers (2025-06-06T10:29:05Z)
Reinforcement learning with combinatorial actions for coupled restless bandits [62.89013331120493]
We propose SEQUOIA, an RL algorithm that directly optimize for long-term reward over the feasible action space. We empirically validate SEQUOIA on four novel restless bandit problems with constraints: multiple interventions, path constraints, bipartite matching, and capacity constraints.
arXiv Detail & Related papers (2025-03-01T21:25:21Z)
Multi-Objective Planning with Contextual Lexicographic Reward Preferences [5.207917381770368]
We present a framework that enables planning under varying lexicographic objective orderings, depending on context. In a CLMDP, both the objective ordering at a state and the associated reward functions are determined by the context. Our algorithm to solve a CLMDP first computes a policy for each objective ordering and then combines them into a single context-aware policy.
arXiv Detail & Related papers (2025-02-13T19:04:22Z)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states. The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z)
Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy. We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z)
Goal-conditioned Offline Reinforcement Learning through State Space Partitioning [9.38848713730931]
offline reinforcement learning (RL) aims to infer sequential decision policies using only offline datasets. We argue that, despite its benefits, this approach is still insufficient to fully address the distribution shift and multi-modality problems. We propose a complementary advantage-based weighting scheme that introduces an additional source of inductive bias.
arXiv Detail & Related papers (2023-03-16T14:52:53Z)
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command. We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z)
Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks. We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z)
Anytime Stochastic Task and Motion Policies [12.72186877599064]
We present a new approach for integrated task and motion planning in settings. Our algorithm is probabilistically complete and can compute feasible solution policies in an anytime fashion.
arXiv Detail & Related papers (2021-08-28T00:23:39Z)
Composable Energy Policies for Reactive Motion Generation and Reinforcement Learning [25.498555742173323]
We introduce Composable Energy Policies (CEP), a novel framework for modular motion generation. CEP computes the control action by optimization over the product of a set of reactive policies. CEP naturally adapts to the Reinforcement Learning problem allowing us to integrate, in a hierarchical fashion, any distribution as prior.
arXiv Detail & Related papers (2021-05-11T11:59:13Z)
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee [61.176159046544946]
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and avoids violation of certain constraints. This is the first-time analysis of SRL algorithms with global optimal policies.
arXiv Detail & Related papers (2020-11-11T16:05:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.