Efficient Strategy Synthesis for MDPs with Resource Constraints
- URL: http://arxiv.org/abs/2105.02099v1
- Date: Wed, 5 May 2021 14:59:30 GMT
- Title: Efficient Strategy Synthesis for MDPs with Resource Constraints
- Authors: Franti\v{s}ek Blahoudek, Petr Novotn\'y, Melkior Ornik, Pranay
Thangeda and Ufuk Topcu
- Abstract summary: We consider strategy synthesis for the formalism called consumption Markov decision processes.
The presented algorithms work in time with respect to the representation of the model.
- Score: 16.774128823546416
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We consider qualitative strategy synthesis for the formalism called
consumption Markov decision processes. This formalism can model dynamics of an
agents that operates under resource constraints in a stochastic environment.
The presented algorithms work in time polynomial with respect to the
representation of the model and they synthesize strategies ensuring that a
given set of goal states will be reached (once or infinitely many times) with
probability 1 without resource exhaustion. In particular, when the amount of
resource becomes too low to safely continue in the mission, the strategy
changes course of the agent towards one of a designated set of reload states
where the agent replenishes the resource to full capacity; with sufficient
amount of resource, the agent attempts to fulfill the mission again.
We also present two heuristics that attempt to reduce expected time that the
agent needs to fulfill the given mission, a parameter important in practical
planning. The presented algorithms were implemented and numerical examples
demonstrate (i) the effectiveness (in terms of computation time) of the
planning approach based on consumption Markov decision processes and (ii) the
positive impact of the two heuristics on planning in a realistic example.
Related papers
- Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - On efficient computation in active inference [1.1470070927586016]
We present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity.
We also simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes.
arXiv Detail & Related papers (2023-07-02T07:38:56Z) - Dual policy as self-model for planning [71.73710074424511]
We refer to the model used to simulate one's decisions as the agent's self-model.
Inspired by current reinforcement learning approaches and neuroscience, we explore the benefits and limitations of using a distilled policy network as the self-model.
arXiv Detail & Related papers (2023-06-07T13:58:45Z) - A State-Augmented Approach for Learning Optimal Resource Management
Decisions in Wireless Networks [58.720142291102135]
We consider a radio resource management (RRM) problem in a multi-user wireless network.
The goal is to optimize a network-wide utility function subject to constraints on the ergodic average performance of users.
We propose a state-augmented parameterization for the RRM policy, where alongside the instantaneous network states, the RRM policy takes as input the set of dual variables corresponding to the constraints.
arXiv Detail & Related papers (2022-10-28T21:24:13Z) - A Deep Reinforcement Learning Approach to Marginalized Importance
Sampling with the Successor Representation [61.740187363451746]
Marginalized importance sampling (MIS) measures the density ratio between the state-action occupancy of a target policy and that of a sampling distribution.
We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy.
We evaluate the empirical performance of our approach on a variety of challenging Atari and MuJoCo environments.
arXiv Detail & Related papers (2021-06-12T20:21:38Z) - Polynomial-Time Algorithms for Multi-Agent Minimal-Capacity Planning [19.614913673879474]
We study the problem of minimizing the resource capacity of autonomous agents cooperating to achieve a shared task.
In a consumption Markov decision process, the agent has a resource of limited capacity.
We develop an algorithm that solves this graph problem in time that is emphpolynomial in the number of agents, target locations, and size of the consumption Markov decision process.
arXiv Detail & Related papers (2021-05-04T00:30:02Z) - Verifiable Planning in Expected Reward Multichain MDPs [20.456052208569115]
We explore the steady-state planning problem of deriving a decision-making policy for an agent.
We prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior.
arXiv Detail & Related papers (2020-12-03T18:54:24Z) - Coordinated Online Learning for Multi-Agent Systems with Coupled
Constraints and Perturbed Utility Observations [91.02019381927236]
We introduce a novel method to steer the agents toward a stable population state, fulfilling the given resource constraints.
The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian.
arXiv Detail & Related papers (2020-10-21T10:11:17Z) - Near-Optimal Reactive Synthesis Incorporating Runtime Information [28.25296947005914]
We consider the problem of optimal reactive synthesis - compute a strategy that satisfies a mission specification in a dynamic environment.
We incorporate task-critical information, that is only available at runtime, into the strategy synthesis in order to improve performance.
arXiv Detail & Related papers (2020-07-31T14:41:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.