Related papers: Optimal Dispatch in Emergency Service System via Reinforcement Learning

Optimal Dispatch in Emergency Service System via Reinforcement Learning

URL: http://arxiv.org/abs/2010.07513v1
Date: Thu, 15 Oct 2020 04:37:41 GMT
Title: Optimal Dispatch in Emergency Service System via Reinforcement Learning
Authors: Cheng Hua and Tauhid Zaman
Abstract summary: In the United States, medical responses by fire departments over the last four decades increased by 367%. We model the ambulance dispatch problem as an average-cost Markov decision process and present a policy iteration approach to find an optimal dispatch policy. Our findings suggest that emergency response departments can improve their performance with minimal to no cost.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the United States, medical responses by fire departments over the last four decades increased by 367%. This had made it critical to decision makers in emergency response departments that existing resources are efficiently used. In this paper, we model the ambulance dispatch problem as an average-cost Markov decision process and present a policy iteration approach to find an optimal dispatch policy. We then propose an alternative formulation using post-decision states that is shown to be mathematically equivalent to the original model, but with a much smaller state space. We present a temporal difference learning approach to the dispatch problem based on the post-decision states. In our numerical experiments, we show that our obtained temporal-difference policy outperforms the benchmark myopic policy. Our findings suggest that emergency response departments can improve their performance with minimal to no cost.

Related papers

Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making [55.2480439325792]
We present a cascaded LLM decision framework that adaptively delegates tasks across multiple tiers of expertise.<n>First, a deferral policy determines whether to accept the base model's answer or regenerate it with the large model.<n>Second, an abstention policy decides whether the cascade model response is sufficiently certain or requires human intervention.
arXiv Detail & Related papers (2025-06-13T15:36:22Z)
Optimization-Augmented Machine Learning for Vehicle Operations in Emergency Medical Services [2.5690340428649328]
Minimizing response times to meet legal requirements and serve patients in a timely manner is crucial for Emergency Medical Service (EMS) systems. We study a centrally controlled EMS system for which we learn an online ambulance dispatching and redeployment policy. We propose a novel optimization-augmented machine learning scheme that allows to learn efficient policies for ambulance dispatching and redeployment.
arXiv Detail & Related papers (2025-03-14T20:15:26Z)
Multi-Agent Reinforcement Learning for Joint Police Patrol and Dispatch [13.336551874123796]
We propose a novel method for jointly optimizing multi-agent patrol and dispatch to learn policies yielding rapid response times. Our method treats each patroller as an independent Q-learner (agent) with a shared deep Q-network that represents the state-action values. We demonstrate that this heterogeneous multi-agent reinforcement learning approach is capable of learning policies that optimize for patrol or dispatch alone.
arXiv Detail & Related papers (2024-09-03T19:19:57Z)
Multi-Agent Reinforcement Learning with Hierarchical Coordination for Emergency Responder Stationing [8.293120269016834]
An emergency responder management (ERM) system dispatches responders when it receives requests for medical aid. ERM systems can proactively reposition responders between predesignated waiting locations to cover any gaps. The state-of-the-art approach in proactive repositioning is a hierarchical approach based on spatial decomposition and online Monte Carlo tree search. We introduce a novel reinforcement learning (RL) approach, based on the same hierarchical decomposition, but replacing online search with learning.
arXiv Detail & Related papers (2024-05-21T21:15:45Z)
Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z)
Deep reinforcement learning for the dynamic vehicle dispatching problem: An event-based approach [0.0]
We model the problem as a semi-Markov decision process, which allows us to treat time as continuous. We argue that an event-based approach substantially reduces the complexity of the decision space and overcomes other limitations of discrete-time models. Results show that our policies exhibit better average waiting times, cancellation rates and total service times, with reduction of up to 50% relative to the other tested policies.
arXiv Detail & Related papers (2023-07-13T16:29:25Z)
Evaluating COVID-19 vaccine allocation policies using Bayesian $m$-top exploration [53.122045119395594]
We present a novel technique for evaluating vaccine allocation strategies using a multi-armed bandit framework. $m$-top exploration allows the algorithm to learn $m$ policies for which it expects the highest utility. We consider the Belgian COVID-19 epidemic using the individual-based model STRIDE, where we learn a set of vaccination policies.
arXiv Detail & Related papers (2023-01-30T12:22:30Z)
Modelling Hospital Strategies in City-Scale Ambulance Dispatching [0.0]
The paper proposes an approach to model and simulate the ambulance dispatching process in multi-agents healthcare environments of large cities. The proposed approach is based on using the coupled game-theoretic (GT) approach to identify hospital strategies. The study considers the problem of dispatching ambulances to patients with the ACS directed to the PCI in the target hospital.
arXiv Detail & Related papers (2022-01-05T22:20:12Z)
A Reinforcement Learning Approach to the Stochastic Cutting Stock Problem [0.0]
We propose a formulation of the cutting stock problem as a discounted infinite-horizon decision process. An optimal solution corresponds to a policy that associates each state with a decision and minimizes the expected total cost.
arXiv Detail & Related papers (2021-09-20T14:47:54Z)
A New Bandit Setting Balancing Information from State Evolution and Corrupted Context [52.67844649650687]
We propose a new sequential decision-making setting combining key aspects of two established online learning problems with bandit feedback. The optimal action to play at any given moment is contingent on an underlying changing state which is not directly observable by the agent. We present an algorithm that uses a referee to dynamically combine the policies of a contextual bandit and a multi-armed bandit.
arXiv Detail & Related papers (2020-11-16T14:35:37Z)
Reinforcement Learning with Efficient Active Feature Acquisition [59.91808801541007]
In real-life, information acquisition might correspond to performing a medical test on a patient. We propose a model-based reinforcement learning framework that learns an active feature acquisition policy. Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states.
arXiv Detail & Related papers (2020-11-02T08:46:27Z)
Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning. Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z)
DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step. Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z)
Transforming unstructured voice and text data into insight for paramedic emergency service using recurrent and convolutional neural networks [68.8204255655161]
Paramedics often have to make lifesaving decisions within a limited time in an ambulance. This study aims to automatically fuse voice and text data to provide tailored situational awareness information to paramedics.
arXiv Detail & Related papers (2020-05-30T06:47:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.