Related papers: Efficient Strategy Synthesis for MDPs with Resource Constraints

Efficient Strategy Synthesis for MDPs with Resource Constraints

URL: http://arxiv.org/abs/2105.02099v1
Date: Wed, 5 May 2021 14:59:30 GMT
Title: Efficient Strategy Synthesis for MDPs with Resource Constraints
Authors: Franti\v{s}ek Blahoudek, Petr Novotn\'y, Melkior Ornik, Pranay Thangeda and Ufuk Topcu
Abstract summary: We consider strategy synthesis for the formalism called consumption Markov decision processes. The presented algorithms work in time with respect to the representation of the model.
Score: 16.774128823546416
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We consider qualitative strategy synthesis for the formalism called consumption Markov decision processes. This formalism can model dynamics of an agents that operates under resource constraints in a stochastic environment. The presented algorithms work in time polynomial with respect to the representation of the model and they synthesize strategies ensuring that a given set of goal states will be reached (once or infinitely many times) with probability 1 without resource exhaustion. In particular, when the amount of resource becomes too low to safely continue in the mission, the strategy changes course of the agent towards one of a designated set of reload states where the agent replenishes the resource to full capacity; with sufficient amount of resource, the agent attempts to fulfill the mission again. We also present two heuristics that attempt to reduce expected time that the agent needs to fulfill the given mission, a parameter important in practical planning. The presented algorithms were implemented and numerical examples demonstrate (i) the effectiveness (in terms of computation time) of the planning approach based on consumption Markov decision processes and (ii) the positive impact of the two heuristics on planning in a realistic example.

Related papers

Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents. Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z)
Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver. We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications. We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z)
On strategies for risk management and decision making under uncertainty shared across multiple fields [55.2480439325792]
The paper finds more than 110 examples of such strategies and this approach to risk is termed RDOT: Risk-reducing Design and Operations Toolkit. RDOT strategies fall into six broad categories: structural, reactive, formal, adversarial, multi-stage and positive. Overall, RDOT represents an overlooked class of versatile responses to uncertainty.
arXiv Detail & Related papers (2023-09-06T16:14:32Z)
On efficient computation in active inference [1.1470070927586016]
We present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity. We also simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes.
arXiv Detail & Related papers (2023-07-02T07:38:56Z)
Dual policy as self-model for planning [71.73710074424511]
We refer to the model used to simulate one's decisions as the agent's self-model. Inspired by current reinforcement learning approaches and neuroscience, we explore the benefits and limitations of using a distilled policy network as the self-model.
arXiv Detail & Related papers (2023-06-07T13:58:45Z)
A State-Augmented Approach for Learning Optimal Resource Management Decisions in Wireless Networks [58.720142291102135]
We consider a radio resource management (RRM) problem in a multi-user wireless network. The goal is to optimize a network-wide utility function subject to constraints on the ergodic average performance of users. We propose a state-augmented parameterization for the RRM policy, where alongside the instantaneous network states, the RRM policy takes as input the set of dual variables corresponding to the constraints.
arXiv Detail & Related papers (2022-10-28T21:24:13Z)
A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation [61.740187363451746]
Marginalized importance sampling (MIS) measures the density ratio between the state-action occupancy of a target policy and that of a sampling distribution. We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy. We evaluate the empirical performance of our approach on a variety of challenging Atari and MuJoCo environments.
arXiv Detail & Related papers (2021-06-12T20:21:38Z)
Polynomial-Time Algorithms for Multi-Agent Minimal-Capacity Planning [19.614913673879474]
We study the problem of minimizing the resource capacity of autonomous agents cooperating to achieve a shared task. In a consumption Markov decision process, the agent has a resource of limited capacity. We develop an algorithm that solves this graph problem in time that is emphpolynomial in the number of agents, target locations, and size of the consumption Markov decision process.
arXiv Detail & Related papers (2021-05-04T00:30:02Z)
Verifiable Planning in Expected Reward Multichain MDPs [20.456052208569115]
We explore the steady-state planning problem of deriving a decision-making policy for an agent. We prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior.
arXiv Detail & Related papers (2020-12-03T18:54:24Z)
Coordinated Online Learning for Multi-Agent Systems with Coupled Constraints and Perturbed Utility Observations [91.02019381927236]
We introduce a novel method to steer the agents toward a stable population state, fulfilling the given resource constraints. The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian.
arXiv Detail & Related papers (2020-10-21T10:11:17Z)
Near-Optimal Reactive Synthesis Incorporating Runtime Information [28.25296947005914]
We consider the problem of optimal reactive synthesis - compute a strategy that satisfies a mission specification in a dynamic environment. We incorporate task-critical information, that is only available at runtime, into the strategy synthesis in order to improve performance.
arXiv Detail & Related papers (2020-07-31T14:41:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.