Related papers: Reinforcement Learning for Target Zone Blood Glucose Control

Reinforcement Learning for Target Zone Blood Glucose Control

URL: http://arxiv.org/abs/2508.03875v1
Date: Tue, 05 Aug 2025 19:35:41 GMT
Title: Reinforcement Learning for Target Zone Blood Glucose Control
Authors: David H. Mguni, Jing Dong, Wanrong Yang, Ziquan Liu, Muhammad Salman Haleem, Baoxiang Wang,
Abstract summary: Reinforcement learning offers promise for personalising treatment, but struggles with the delayed and heterogeneous effects of interventions.<n>We propose a novel RL framework to study and support decision-making in T1DM technologies, such as automated insulin delivery.
Score: 15.612220895230065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Managing physiological variables within clinically safe target zones is a central challenge in healthcare, particularly for chronic conditions such as Type 1 Diabetes Mellitus (T1DM). Reinforcement learning (RL) offers promise for personalising treatment, but struggles with the delayed and heterogeneous effects of interventions. We propose a novel RL framework to study and support decision-making in T1DM technologies, such as automated insulin delivery. Our approach captures the complex temporal dynamics of treatment by unifying two control modalities: \textit{impulse control} for discrete, fast-acting interventions (e.g., insulin boluses), and \textit{switching control} for longer-acting treatments and regime shifts. The core of our method is a constrained Markov decision process augmented with physiological state features, enabling safe policy learning under clinical and resource constraints. The framework incorporates biologically realistic factors, including insulin decay, leading to policies that better reflect real-world therapeutic behaviour. While not intended for clinical deployment, this work establishes a foundation for future safe and temporally-aware RL in healthcare. We provide theoretical guarantees of convergence and demonstrate empirical improvements in a stylised T1DM control task, reducing blood glucose level violations from 22.4\% (state-of-the-art) to as low as 10.8\%.

Related papers

Are Large Language Models Dynamic Treatment Planners? An In Silico Study from a Prior Knowledge Injection Angle [3.0391297540732545]
We evaluate large language models (LLMs) as dynamic insulin dosing agents in an in silico Type 1 diabetes simulator.<n>Our results indicate that carefully designed zero-shot prompts enable smaller LLMs to achieve comparable or superior clinical performance.<n>LLMs exhibit notable limitations, such as overly aggressive insulin dosing when prompted with chain-of-thought.
arXiv Detail & Related papers (2025-08-06T13:46:02Z)
Beyond the ATE: Interpretable Modelling of Treatment Effects over Dose and Time [46.2482873419289]
We propose a framework for modelling treatment effect trajectories as smooth surfaces over dose and time.<n>Our approach decouples the estimation of trajectory shape from the specification of clinically relevant properties.<n>We show that our method yields accurate, interpretable, and editable models of treatment dynamics.
arXiv Detail & Related papers (2025-07-09T20:33:33Z)
Towards Regulatory-Confirmed Adaptive Clinical Trials: Machine Learning Opportunities and Solutions [59.28853595868749]
We introduce two new objectives for future clinical trials that integrate regulatory constraints and treatment policy value for both the entire population and under-served populations.<n>We formulate Randomize First Augment Next (RFAN), a new framework for designing Phase III clinical trials.<n>Our framework consists of a standard randomized component followed by an adaptive one, jointly meant to efficiently and safely acquire and assign patients into treatment arms during the trial.
arXiv Detail & Related papers (2025-03-12T10:17:54Z)
Flexible Blood Glucose Control: Offline Reinforcement Learning from Human Feedback [3.3457851904072595]
Paint is an original RL framework for learning flexible insulin dosing policies from patient records.<n>Labelled data trains a reward model, informing the actions of a novel safety-constrained offline RL algorithm.<n>In-silico evaluation shows Paint achieves common glucose goals through simple labelling of desired states, reducing glycaemic risk by 15% over a commercial benchmark.
arXiv Detail & Related papers (2025-01-27T11:31:40Z)
Training-Aware Risk Control for Intensity Modulated Radiation Therapies Quality Assurance with Conformal Prediction [7.227232362460348]
Measurement quality assurance practices play a key role in the safe use of Intensity Modulated Radiation Therapies (IMRT) for cancer treatment.<n>These practices have reduced measurement-based IMRT QA failure below 1%.<n>We propose a new training-aware conformal risk control method by combining the benefit of conformal risk control and conformal training.
arXiv Detail & Related papers (2025-01-15T17:19:51Z)
An Improved Strategy for Blood Glucose Control Using Multi-Step Deep Reinforcement Learning [3.5757761767474876]
Blood Glucose (BG) control involves keeping an individual's BG within a healthy range through extracorporeal insulin injections. Recent research has been devoted to exploring individualized and automated BG control approaches. Deep Reinforcement Learning (DRL) shows potential as an emerging approach.
arXiv Detail & Related papers (2024-03-12T11:53:00Z)
Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning [13.783833824324333]
We propose a hybrid control policy for the artificial pancreas (HyCPAP) to address the challenges of closed-loop glucose control. We conduct extensive experiments using the FDA-accepted UVA/Padova T1DM simulator. Our approaches achieve the highest percentage of time spent in the desired euglycemic range and the lowest occurrences of hypoglycemia.
arXiv Detail & Related papers (2023-07-13T00:53:09Z)
Automatic diagnosis of knee osteoarthritis severity using Swin transformer [55.01037422579516]
Knee osteoarthritis (KOA) is a widespread condition that can cause chronic pain and stiffness in the knee joint. We propose an automated approach that employs the Swin Transformer to predict the severity of KOA.
arXiv Detail & Related papers (2023-07-10T09:49:30Z)
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care [46.2482873419289]
We introduce a deep Q-learning approach to obtain more reliable critical care policies. We evaluate our method in off-policy and offline settings using simulated environments and real health records from intensive care units.
arXiv Detail & Related papers (2023-06-13T18:02:57Z)
Automated Fidelity Assessment for Strategy Training in Inpatient Rehabilitation using Natural Language Processing [53.096237570992294]
Strategy training is a rehabilitation approach that teaches skills to reduce disability among those with cognitive impairments following a stroke. Standardized fidelity assessment is used to measure adherence to treatment principles. We developed a rule-based NLP algorithm, a long-short term memory (LSTM) model, and a bidirectional encoder representation from transformers (BERT) model for this task.
arXiv Detail & Related papers (2022-09-14T15:33:30Z)
Boundary Guided Semantic Learning for Real-time COVID-19 Lung Infection Segmentation System [69.40329819373954]
The coronavirus disease 2019 (COVID-19) continues to have a negative impact on healthcare systems around the world. At the current stage, automatically segmenting the lung infection area from CT images is essential for the diagnosis and treatment of COVID-19. We propose a boundary guided semantic learning network (BSNet) in this paper.
arXiv Detail & Related papers (2022-09-07T05:01:38Z)
DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret [59.81290762273153]
Dynamic treatment regimes (DTRs) are personalized, adaptive, multi-stage treatment plans that adapt treatment decisions to an individual's initial features and to intermediate outcomes and features at each subsequent stage. We propose a novel algorithm that, by carefully balancing exploration and exploitation, is guaranteed to achieve rate-optimal regret when the transition and reward models are linear.
arXiv Detail & Related papers (2020-05-06T13:03:42Z)
Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises. Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions. We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.