Related papers: A learning-driven automatic planning framework for proton PBS treatments of H&N cancers

A learning-driven automatic planning framework for proton PBS treatments of H&N cancers

URL: http://arxiv.org/abs/2508.11085v2
Date: Mon, 15 Sep 2025 17:16:18 GMT
Title: A learning-driven automatic planning framework for proton PBS treatments of H&N cancers
Authors: Qingqing Wang, Liqiang Xiao, Chang Chang,
Abstract summary: Inverse parameter is a learning-to-optimize (L2O) method that predicts update steps by learning from task-specific data distributions.<n>In experiments, total 97 patients with bilateral or ipsilateral H&N cancers are collected for training and testing.
Score: 2.0765076553348316
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Proton pencil beam scanning (PBS) treatment planning for head & neck (H&N) cancers involves numerous conflicting objectives, requiring iterative objective parameter adjustments to balance multiple clinical goals. We propose a learning-driven inverse optimizer and integrate it into a proximal policy optimization (PPO)-based planning framework to automatically generate high-quality plans for patients with diverse treatment requirements. The inverse optimizer is a learning-to-optimize (L2O) method that predicts update steps by learning from task-specific data distributions. For the first time, long-context processing techniques developed for large language models (LLMs) are utilized to address the scalability limitations of existing L2O methods, enabling simultaneous optimization over a substantially large set of variables. The PPO framework functions as an outer-loop virtual planner, autonomously adjusting objective parameters through a policy network, and the inner-loop L2O inverse optimizer computes machine-deliverable spot monitor unit (MU) values based on the PPO-refined objectives. Moreover, a Swin UnetR dose predictor is trained with prescription- and beam-specific information to estimate the initial objective parameters. In our experiments, total 97 patients with bilateral or ipsilateral H&N cancers are collected for training and testing. Compared with the second-order gradient-based methods, our L2O optimizer improves the effectiveness and efficiency of the time-consuming inverse optimization by 22.97% and 36.41%, respectively, and in conjunction with the PPO-based virtual planner, plans are generated within clinically acceptable times, i.e. 2.55 hours in average, and shows improved or comparable organs-at-risk sparing with superior target coverage compared with human-generated plans.

Related papers

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization [60.87651283510059]
Group Relative Policy Optimization (GRPO) effectively scales LLM reasoning but incurs prohibitive computational costs.<n>We propose Dynamic Pruning Policy Optimization (DPPO), a framework that enables dynamic pruning while preserving unbiased gradient estimation.<n>To mitigate the data sparsity induced by pruning, we introduce Dense Prompt Packing, a window-based greedy strategy.
arXiv Detail & Related papers (2026-03-04T14:48:53Z)
Zero-Shot Large Language Model Agents for Fully Automated Radiotherapy Treatment Planning [14.814676057920067]
A large language model (LLM)-based agent navigates inverse treatment planning for intensity-modulated radiation therapy (IMRT)<n>The agent's decision-making process is informed by current observations and previous optimization attempts and evaluations.<n>This study demonstrates the feasibility of a zero-shot, LLM-driven workflow for automated IMRT treatment planning in a commercial TPS.
arXiv Detail & Related papers (2025-10-12T19:21:21Z)
An Iterative LLM Framework for SIBT utilizing RAG-based Adaptive Weight Optimization [11.168299220031662]
This study proposes an adaptive weight optimization framework for SIBT planning, driven by large language models (LLMs)<n>A clinical knowledge base, constructed and queried via retrieval-augmented generation (RAG), enhances the model's domain-specific reasoning.<n>The proposed method was validated on 23 patient cases, showing that the LLM-assisted approach produces plans that are comparable to or exceeding clinically approved and fixed-weight plans.
arXiv Detail & Related papers (2025-09-10T08:54:16Z)
Automated Treatment Planning for Interstitial HDR Brachytherapy for Locally Advanced Cervical Cancer using Deep Reinforcement Learning [3.9838929530763076]
The objective of this study is to develop a fully automated HDR brachytherapy planning framework.<n>We propose a hierarchical two-stage autoplanning framework.<n>For the unseen test patients, the RL-based automated planning method achieved an average score of 93.89%, outperforming the clinical plans which averaged 91.86%.
arXiv Detail & Related papers (2025-06-13T17:07:30Z)
Accelerating RL for LLM Reasoning with Optimal Advantage Regression [52.0792918455501]
We propose a novel two-stage policy optimization framework that directly approximates the optimal advantage function.<n>$A$*-PO achieves competitive performance across a wide range of mathematical reasoning benchmarks.<n>It reduces training time by up to 2$times$ and peak memory usage by over 30% compared to PPO, GRPO, and REBEL.
arXiv Detail & Related papers (2025-05-27T03:58:50Z)
Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent [2.1986172572830096]
Dose Optimization Language Agent (DOLA) is an autonomous large language model (LLM)-based agent designed for optimizing radiotherapy treatment plans.<n>DOLA integrates the LLaMa3.1 LLM directly with a commercial treatment planning system.<n> operating entirely within secure local infrastructure.
arXiv Detail & Related papers (2025-03-21T22:01:19Z)
DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization [53.27954325490941]
Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives.<n>This research introduces a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model.
arXiv Detail & Related papers (2025-02-11T04:00:21Z)
Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction [71.81851971324187]
This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL) HPO addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks. Experiments on challenging robotic navigation and manipulation tasks demonstrate impressive performance of HPO, where it shows an improvement of up to 35% over the baselines.
arXiv Detail & Related papers (2024-11-01T04:58:40Z)
Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy. We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z)
Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning [0.7519872646378836]
We propose an automatic treatment planning model using the proximal policy optimization (PPO) algorithm and a dose distribution-based reward function. A set of empirical rules is used to create auxiliary planning structures from target volumes and organs-at-risk. A decision-making policy network trained using PPO is developed to iteratively adjust the involved planning objective parameters in a continuous action space.
arXiv Detail & Related papers (2024-09-17T22:01:56Z)
Continual Model-based Reinforcement Learning for Data Efficient Wireless Network Optimisation [73.04087903322237]
We formulate throughput optimisation as Continual Reinforcement Learning of control policies. Simulation results suggest that the proposed system is able to shorten the end-to-end deployment lead-time by two-fold.
arXiv Detail & Related papers (2024-04-30T11:23:31Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.<n>We identify two pivotal factors in model parameter learning: update direction and update method.<n>We develop a capable Gradient-inspired Prompt-based GPO.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs [27.41101006357176]
In this work, we take a minorization-maximization perspective to iteratively optimize the. w.r.t. a locally tight lower-bounded objective. This novel formulation of learning as iterative lower bound optimization (ILBO) is particularly appealing because (i) each step is structurally easier to optimize than the overall objective. Empirical evaluation confirms that ILBO is significantly more sample-efficient than the state-of-the-art planner.
arXiv Detail & Related papers (2022-03-23T19:06:16Z)
Resource Planning for Hospitals Under Special Consideration of the COVID-19 Pandemic: Optimization and Sensitivity Analysis [87.31348761201716]
Crises like the COVID-19 pandemic pose a serious challenge to health-care institutions. BaBSim.Hospital is a tool for capacity planning based on discrete event simulation. We aim to investigate and optimize these parameters to improve BaBSim.Hospital.
arXiv Detail & Related papers (2021-05-16T12:38:35Z)
A feasibility study of a hyperparameter tuning approach to automated inverse planning in radiotherapy [68.8204255655161]
The purpose of this study is to automate the inverse planning process to reduce active planning time while maintaining plan quality. We investigated the impact of the choice of dose parameters, random and Bayesian search methods, and utility function form on planning time and plan quality. Using 100 samples was found to produce satisfactory plan quality, and the average planning time was 2.3 hours.
arXiv Detail & Related papers (2021-05-14T18:37:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.