Patient-specific vs Multi-Patient Vision Transformer for Markerless Tumor Motion Forecasting
- URL: http://arxiv.org/abs/2507.07811v1
- Date: Thu, 10 Jul 2025 14:40:52 GMT
- Title: Patient-specific vs Multi-Patient Vision Transformer for Markerless Tumor Motion Forecasting
- Authors: Gauthier Rotsart de Hertaing, Dani Manjah, Benoit Macq,
- Abstract summary: This work introduces a markerless forecasting approach for lung tumor motion using Vision Transformers (ViT)<n>Two training strategies are evaluated under clinically realistic constraints: a patient-specific (PS) approach that learns individualized motion patterns, and a multi-patient (MP) model designed for generalization.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background: Accurate forecasting of lung tumor motion is essential for precise dose delivery in proton therapy. While current markerless methods mostly rely on deep learning, transformer-based architectures remain unexplored in this domain, despite their proven performance in trajectory forecasting. Purpose: This work introduces a markerless forecasting approach for lung tumor motion using Vision Transformers (ViT). Two training strategies are evaluated under clinically realistic constraints: a patient-specific (PS) approach that learns individualized motion patterns, and a multi-patient (MP) model designed for generalization. The comparison explicitly accounts for the limited number of images that can be generated between planning and treatment sessions. Methods: Digitally reconstructed radiographs (DRRs) derived from planning 4DCT scans of 31 patients were used to train the MP model; a 32nd patient was held out for evaluation. PS models were trained using only the target patient's planning data. Both models used 16 DRRs per input and predicted tumor motion over a 1-second horizon. Performance was assessed using Average Displacement Error (ADE) and Final Displacement Error (FDE), on both planning (T1) and treatment (T2) data. Results: On T1 data, PS models outperformed MP models across all training set sizes, especially with larger datasets (up to 25,000 DRRs, p < 0.05). However, MP models demonstrated stronger robustness to inter-fractional anatomical variability and achieved comparable performance on T2 data without retraining. Conclusions: This is the first study to apply ViT architectures to markerless tumor motion forecasting. While PS models achieve higher precision, MP models offer robust out-of-the-box performance, well-suited for time-constrained clinical settings.
Related papers
- Glioblastoma Overall Survival Prediction With Vision Transformers [6.318465743962574]
Glioblastoma is one of the most aggressive and common brain tumors, with a median survival of 10-15 months.<n>In this study, we propose a novel Artificial Intelligence (AI) approach for Overall Survival (OS) prediction using Magnetic Resonance Imaging (MRI) images.<n>We exploit Vision Transformers (ViTs) to extract hidden features directly from MRI images, eliminating the need of tumor segmentation.<n>The proposed model was evaluated on the BRATS dataset, reaching an accuracy of 62.5% on the test set, comparable to the top-performing methods.
arXiv Detail & Related papers (2025-08-04T13:59:57Z) - Finetuning and Quantization of EEG-Based Foundational BioSignal Models on ECG and PPG Data for Blood Pressure Estimation [53.2981100111204]
Photoplethysmography and electrocardiography can potentially enable continuous blood pressure (BP) monitoring.<n>Yet accurate and robust machine learning (ML) models remains challenging due to variability in data quality and patient-specific factors.<n>In this work, we investigate whether a model pre-trained on one modality can effectively be exploited to improve the accuracy of a different signal type.<n>Our approach achieves near state-of-the-art accuracy for diastolic BP and surpasses by 1.5x the accuracy of prior works for systolic BP.
arXiv Detail & Related papers (2025-02-10T13:33:12Z) - PaPaGei: Open Foundation Models for Optical Physiological Signals [8.78925327256804]
Photoplethysmography is the leading non-invasive technique for monitoring biosignals and cardiovascular health.<n>Machine learning models trained on PPG signals tend to be task-specific and struggle with generalization.<n>We present PaPaGei, the first open foundation model for PPG signals.
arXiv Detail & Related papers (2024-10-27T18:18:06Z) - Phikon-v2, A large and public feature extractor for biomarker prediction [42.52549987351643]
We train a vision transformer using DINOv2 and publicly release one iteration of this model for further experimentation, coined Phikon-v2.
While trained on publicly available histology slides, Phikon-v2 surpasses our previously released model (Phikon) and performs on par with other histopathology foundation models (FM) trained on proprietary data.
arXiv Detail & Related papers (2024-09-13T20:12:29Z) - PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - MPRE: Multi-perspective Patient Representation Extractor for Disease
Prediction [3.914545513460964]
We propose the Multi-perspective Patient Representation Extractor (MPRE) for disease prediction.
Specifically, we propose Frequency Transformation Module (FTM) to extract the trend and variation information of dynamic features.
In the 2D Multi-Extraction Network (2D MEN), we form the 2D temporal tensor based on trend and variation.
We also propose the First-Order Difference Attention Mechanism (FODAM) to calculate the contributions of differences in adjacent variations to the disease diagnosis.
arXiv Detail & Related papers (2024-01-01T13:52:05Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Domain Transfer Through Image-to-Image Translation for Uncertainty-Aware Prostate Cancer Classification [42.75911994044675]
We present a novel approach for unpaired image-to-image translation of prostate MRIs and an uncertainty-aware training approach for classifying clinically significant PCa.
Our approach involves a novel pipeline for translating unpaired 3.0T multi-parametric prostate MRIs to 1.5T, thereby augmenting the available training data.
Our experiments demonstrate that the proposed method significantly improves the Area Under ROC Curve (AUC) by over 20% compared to the previous work.
arXiv Detail & Related papers (2023-07-02T05:26:54Z) - Med-Tuning: A New Parameter-Efficient Tuning Framework for Medical Volumetric Segmentation [37.42382366505377]
We introduce a new framework named Med-Tuning to realize parameter-efficient tuning (PET) for medical volumetric segmentation task.
Our framework enhances the 2D baselines's precision on segmentation tasks, which are pre-trained on natural images.
Compared to full FT, Med-Tuning reduces the fine-tuned model parameters by up to 4x, with even better segmentation performance.
arXiv Detail & Related papers (2023-04-21T10:47:13Z) - Validated respiratory drug deposition predictions from 2D and 3D medical
images with statistical shape models and convolutional neural networks [47.187609203210705]
We aim to develop and validate an automated computational framework for patient-specific deposition modelling.
An image processing approach is proposed that could produce 3D patient respiratory geometries from 2D chest X-rays and 3D CT images.
arXiv Detail & Related papers (2023-03-02T07:47:07Z) - TMSS: An End-to-End Transformer-based Multimodal Network for
Segmentation and Survival Prediction [0.0]
oncologists do not do this in their analysis but rather fuse the information in their brain from multiple sources such as medical images and patient history.
This work proposes a deep learning method that mimics oncologists' analytical behavior when quantifying cancer and estimating patient survival.
arXiv Detail & Related papers (2022-09-12T06:22:05Z) - Unsupervised Pre-Training on Patient Population Graphs for Patient-Level
Predictions [48.02011627390706]
Pre-training has shown success in different areas of machine learning, such as Computer Vision (CV), Natural Language Processing (NLP) and medical imaging.
In this paper, we apply unsupervised pre-training to heterogeneous, multi-modal EHR data for patient outcome prediction.
We find that our proposed graph based pre-training method helps in modeling the data at a population level.
arXiv Detail & Related papers (2022-03-23T17:59:45Z) - Segmentation by Test-Time Optimization (TTO) for CBCT-based Adaptive
Radiation Therapy [2.5705729402510338]
Traditional or deep learning (DL) based deformable image registration (DIR) can achieve improved results in many situations.
We propose a method called test-time optimization (TTO) to refine a pre-trained DL-based DIR population model.
Our proposed method is less susceptible to the generalizability problem, and thus can improve overall performance of different DL-based DIR models.
arXiv Detail & Related papers (2022-02-08T16:34:22Z) - Appearance Learning for Image-based Motion Estimation in Tomography [60.980769164955454]
In tomographic imaging, anatomical structures are reconstructed by applying a pseudo-inverse forward model to acquired signals.
Patient motion corrupts the geometry alignment in the reconstruction process resulting in motion artifacts.
We propose an appearance learning approach recognizing the structures of rigid motion independently from the scanned object.
arXiv Detail & Related papers (2020-06-18T09:49:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.