Shielding in Resource-Constrained Goal POMDPs
- URL: http://arxiv.org/abs/2211.15349v1
- Date: Mon, 28 Nov 2022 14:30:05 GMT
- Title: Shielding in Resource-Constrained Goal POMDPs
- Authors: Michal Ajdar\'ow, \v{S}imon Brlej, Petr Novotn\'y
- Abstract summary: We consider partially observable Markov decision processes (POMDPs) modeling an agent that needs a supply of a certain resource to operate correctly.
The agent aims to minimize the expected cost of reaching some goal while preventing resource exhaustion, a problem we call emphresource-constrained goal optimization (RSGO)
We implement our algorithm and present experiments showing its applicability to benchmarks from the literature.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider partially observable Markov decision processes (POMDPs) modeling
an agent that needs a supply of a certain resource (e.g., electricity stored in
batteries) to operate correctly. The resource is consumed by agent's actions
and can be replenished only in certain states. The agent aims to minimize the
expected cost of reaching some goal while preventing resource exhaustion, a
problem we call \emph{resource-constrained goal optimization} (RSGO). We take a
two-step approach to the RSGO problem. First, using formal methods techniques,
we design an algorithm computing a \emph{shield} for a given scenario: a
procedure that observes the agent and prevents it from using actions that might
eventually lead to resource exhaustion. Second, we augment the POMCP heuristic
search algorithm for POMDP planning with our shields to obtain an algorithm
solving the RSGO problem. We implement our algorithm and present experiments
showing its applicability to benchmarks from the literature.
Related papers
- Efficient Constraint Generation for Stochastic Shortest Path Problems [0.0]
We present an efficient version of constraint generation for Shortest Path Problems (SSPs)
This technique allows algorithms to ignore sub-optimal actions and avoid computing their costs-to-go.
Our experiments show that CG-iLAO* ignores up to 57% of iLAO*'s actions and it solves problems up to 8x and 3x faster than LRTDP and iLAO*.
arXiv Detail & Related papers (2024-01-26T04:00:07Z) - Decentralised Q-Learning for Multi-Agent Markov Decision Processes with
a Satisfiability Criterion [0.0]
We propose a reinforcement learning algorithm to solve a multi-agent Markov decision process (MMDP)
The goal is to lower the time average cost of each agent to below a pre-specified agent-specific bound.
arXiv Detail & Related papers (2023-11-21T13:56:44Z) - Differentially Private Deep Q-Learning for Pattern Privacy Preservation
in MEC Offloading [76.0572817182483]
attackers may eavesdrop on the offloading decisions to infer the edge server's (ES's) queue information and users' usage patterns.
We propose an offloading strategy which jointly minimizes the latency, ES's energy consumption, and task dropping rate, while preserving pattern privacy (PP)
We develop a Differential Privacy Deep Q-learning based Offloading (DP-DQO) algorithm to solve this problem while addressing the PP issue by injecting noise into the generated offloading decisions.
arXiv Detail & Related papers (2023-02-09T12:50:18Z) - TransPath: Learning Heuristics For Grid-Based Pathfinding via
Transformers [64.88759709443819]
We suggest learning the instance-dependent proxies that are supposed to notably increase the efficiency of the search.
The first proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one.
The second proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path.
arXiv Detail & Related papers (2022-12-22T14:26:11Z) - Stochastic Direct Search Method for Blind Resource Allocation [6.574808513848414]
We study direct search (also known as pattern search) methods for linearly constrained and derivative-free optimization.
We show that direct search methods achieves finite regret in the deterministic and unconstrained case.
We propose a simple extension of direct search that achieves a regret upper-bound of the order of $T2/3$.
arXiv Detail & Related papers (2022-10-11T07:40:45Z) - Computation Offloading and Resource Allocation in F-RANs: A Federated
Deep Reinforcement Learning Approach [67.06539298956854]
fog radio access network (F-RAN) is a promising technology in which the user mobile devices (MDs) can offload computation tasks to the nearby fog access points (F-APs)
arXiv Detail & Related papers (2022-06-13T02:19:20Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Resource-Aware Distributed Submodular Maximization: A Paradigm for
Multi-Robot Decision-Making [3.5788754401889022]
Resource-Aware distributed Greedy is the first algorithm to account for each robot's on-board resources independently.
RAG balances the trade-off of centralization, for global near-optimality, vs. decentralization, for near-minimal on-board computation, communication, and memory resources.
arXiv Detail & Related papers (2022-04-15T15:47:05Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z) - Queueing Network Controls via Deep Reinforcement Learning [0.0]
We develop a Proximal policy optimization algorithm for queueing networks.
The algorithm consistently generates control policies that outperform state-of-arts in literature.
A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function.
arXiv Detail & Related papers (2020-07-31T01:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.