Related papers: Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning

Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning

URL: http://arxiv.org/abs/2409.11576v1
Date: Tue, 17 Sep 2024 22:01:56 GMT
Title: Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning
Authors: Qingqing Wang, Chang Chang,
Abstract summary: We propose an automatic treatment planning model using the proximal policy optimization (PPO) algorithm and a dose distribution-based reward function. A set of empirical rules is used to create auxiliary planning structures from target volumes and organs-at-risk. A decision-making policy network trained using PPO is developed to iteratively adjust the involved planning objective parameters in a continuous action space.
Score: 0.7519872646378836
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Proton pencil beam scanning (PBS) treatment planning for head and neck (H&N) cancers is a time-consuming and experience-demanding task where a large number of planning objectives are involved. Deep reinforcement learning (DRL) has recently been introduced to the planning processes of intensity-modulated radiation therapy and brachytherapy for prostate, lung, and cervical cancers. However, existing approaches are built upon the Q-learning framework and weighted linear combinations of clinical metrics, suffering from poor scalability and flexibility and only capable of adjusting a limited number of planning objectives in discrete action spaces. We propose an automatic treatment planning model using the proximal policy optimization (PPO) algorithm and a dose distribution-based reward function for proton PBS treatment planning of H&N cancers. Specifically, a set of empirical rules is used to create auxiliary planning structures from target volumes and organs-at-risk (OARs), along with their associated planning objectives. These planning objectives are fed into an in-house optimization engine to generate the spot monitor unit (MU) values. A decision-making policy network trained using PPO is developed to iteratively adjust the involved planning objective parameters in a continuous action space and refine the PBS treatment plans using a novel dose distribution-based reward function. Proton H&N treatment plans generated by the model show improved OAR sparing with equal or superior target coverage when compared with human-generated plans. Moreover, additional experiments on liver cancer demonstrate that the proposed method can be successfully generalized to other treatment sites. To the best of our knowledge, this is the first DRL-based automatic treatment planning model capable of achieving human-level performance for H&N cancers.

Related papers

Transforming Multimodal Models into Action Models for Radiotherapy [39.682133213072554]
Radiotherapy a crucial cancer treatment demands precise planning to balance tumor preservation and eradication of healthy tissue. Traditional treatment planning (TP) is iterative, time-consuming, and reliant on human expertise. We propose a novel framework to transform a multimodal foundation model (MLM) into an action model for using a few-shot reinforcement learning approach.
arXiv Detail & Related papers (2025-02-06T09:51:28Z)
Automating High Quality RT Planning at Scale [4.660056689223253]
We introduce the Automated Iterative RT Planning (AIRTP) system, a scalable solution for generating high-quality treatment plans. Our AIRTP pipeline adheres to clinical guidelines and automates essential steps, including organ-at-risk (OAR) contouring, helper structure creation, beam setup, optimization, and plan quality improvement. A comparative analysis of plan quality reveals that our automated pipeline produces treatment plans of quality comparable to those generated manually.
arXiv Detail & Related papers (2025-01-21T00:44:18Z)
Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction [71.81851971324187]
This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL) HPO addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks. Experiments on challenging robotic navigation and manipulation tasks demonstrate impressive performance of HPO, where it shows an improvement of up to 35% over the baselines.
arXiv Detail & Related papers (2024-11-01T04:58:40Z)
Automated radiotherapy treatment planning guided by GPT-4Vision [27.56613357226252]
This study introduces GPT-RadPlan, a fully automated treatment planning framework. GPT-RadPlan harnesses prior radiation oncology knowledge encoded in multi-modal large language models, such as GPT-4Vision (GPT-4V) from OpenAI. GPT-RadPlan is integrated into our in-house inverse treatment planning system through an API.
arXiv Detail & Related papers (2024-06-21T19:23:03Z)
Large-Language-Model Empowered Dose Volume Histogram Prediction for Intensity Modulated Radiotherapy [11.055104826451126]
We propose a pipeline to convert unstructured images to a structured graph consisting of image-patch nodes and dose nodes. A novel Dose Graph Neural Network (DoseGNN) model is developed for predicting Dose-Volume histograms (DVHs) from the structured graph. In this study, we introduced an online human-AI collaboration system as a practical implementation of the concept proposed for the automation of intensity-modulated radiotherapy (IMRT) planning.
arXiv Detail & Related papers (2024-02-11T11:24:09Z)
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning [64.10794426777493]
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks. Recent practices tend to distill optimized action sequences into an RL policy during the training phase. We develop an approach to distill from model-based planning to the policy.
arXiv Detail & Related papers (2023-07-24T16:52:31Z)
Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy. We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z)
A Decision Making Approach for Chemotherapy Planning based on Evolutionary Processing [0.0]
In this paper, a multi-objective meta-heuristic method is provided for cancer chemotherapy. The proposed method uses mathematical models in order to measure the drug concentration, tumor growth and the amount of toxicity. Results show that the proposed method achieve to a better therapeutic performance compared to a more recent similar method.
arXiv Detail & Related papers (2023-03-19T02:26:50Z)
OpenKBP-Opt: An international and reproducible evaluation of 76 knowledge-based planning pipelines [48.547200649819615]
We establish an open framework for developing plan optimization models for knowledge-based planning (KBP) in radiotherapy. Our framework includes reference plans for 100 patients with head-and-neck cancer and high-quality dose predictions from 19 KBP models.
arXiv Detail & Related papers (2022-02-16T19:18:42Z)
A feasibility study of a hyperparameter tuning approach to automated inverse planning in radiotherapy [68.8204255655161]
The purpose of this study is to automate the inverse planning process to reduce active planning time while maintaining plan quality. We investigated the impact of the choice of dose parameters, random and Bayesian search methods, and utility function form on planning time and plan quality. Using 100 samples was found to produce satisfactory plan quality, and the average planning time was 2.3 hours.
arXiv Detail & Related papers (2021-05-14T18:37:00Z)
Rapid treatment planning for low-dose-rate prostate brachytherapy with TP-GAN [9.064664319018064]
Treatment planning in low-dose-rate prostate brachytherapy (LDR-PB) aims to produce arrangement of implantable radioactive seeds that deliver a minimum prescribed dose to the prostate. There can be multiple seed arrangements that satisfy this dosimetric criterion, not all deemed 'acceptable' for implant from a physician's perspective. We propose a method that aims to reduce this variability by training a model to learn from a large pool of successful retrospective LDR-PB data.
arXiv Detail & Related papers (2021-03-18T03:02:45Z)
Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption. We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan. We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.