Related papers: TwinWeaver: An LLM-Based Foundation Model Framework for Pan-Cancer Digital Twins

TwinWeaver: An LLM-Based Foundation Model Framework for Pan-Cancer Digital Twins

URL: http://arxiv.org/abs/2601.20906v1
Date: Wed, 28 Jan 2026 15:40:54 GMT
Title: TwinWeaver: An LLM-Based Foundation Model Framework for Pan-Cancer Digital Twins
Authors: Nikita Makarov, Maria Bordukova, Lena Voith von Voithenberg, Estrella Pivel-Villanueva, Sabrina Mielke, Jonathan Wickes, Hanchen Wang, Mingyu Derek Ma, Keunwoo Choi, Kyunghyun Cho, Stephen Ra, Raul Rodriguez-Esteban, Fabian Schmich, Michael Menden,
Abstract summary: We build Genie Digital Twin (GDT) on 93,054 patients across cancer types.<n>GDT significantly reduces forecasting error, achieving a median Mean Absolute Scaled Error (MASE) of 0.87.<n>GDT generalizes to out-of-distribution matching clinical trials, trained baselines at zero-shot and surpassed them with fine-tuning.
Score: 33.30007167473537
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Precision oncology requires forecasting clinical events and trajectories, yet modeling sparse, multi-modal clinical time series remains a critical challenge. We introduce TwinWeaver, an open-source framework that serializes longitudinal patient histories into text, enabling unified event prediction as well as forecasting with large language models, and use it to build Genie Digital Twin (GDT) on 93,054 patients across 20 cancer types. In benchmarks, GDT significantly reduces forecasting error, achieving a median Mean Absolute Scaled Error (MASE) of 0.87 compared to 0.97 for the strongest time-series baseline (p<0.001). Furthermore, GDT improves risk stratification, achieving an average concordance index (C-index) of 0.703 across survival, progression, and therapy switching tasks, surpassing the best baseline of 0.662. GDT also generalizes to out-of-distribution clinical trials, matching trained baselines at zero-shot and surpassing them with fine-tuning, achieving a median MASE of 0.75-0.88 and outperforming the strongest baseline in event prediction with an average C-index of 0.672 versus 0.648. Finally, TwinWeaver enables an interpretable clinical reasoning extension, providing a scalable and transparent foundation for longitudinal clinical modeling.

Related papers

Explainable Admission-Level Predictive Modeling for Prolonged Hospital Stay in Elderly Populations: Challenges in Low- and Middle-Income Countries [65.4286079244589]
Prolonged length of stay (pLoS) is a significant factor associated with the risk of adverse in-hospital events.<n>We develop and explain a predictive model for pLos using admission-level patient and hospital administrative data.
arXiv Detail & Related papers (2026-01-07T23:35:24Z)
Knowledge Graph Augmented Large Language Models for Disease Prediction [24.992170033802537]
Knowledge graph (KG)-guided chain-of-thought (CoT) framework generates clinically grounded reasoning for visit-level disease prediction in MIMIC-III.<n> ICD-9 codes are mapped to PrimeKG, from which disease-relevant nodes and multi-hop reasoning paths are extracted and used as scaffolds for CoT generation.<n> KG-guided models outperform strong classical baselines, achieving AUROC values of 0.66 to 0.70 and macro-AUPR values of 0.40 to 0.47.<n>A blinded clinician evaluation shows consistent preference for KG-guided CoT explanations in clarity, relevance, and clinical correctness.
arXiv Detail & Related papers (2025-12-01T02:49:17Z)
Chronic Kidney Disease Prognosis Prediction Using Transformer [2.054117570146147]
Chronic Kidney Disease (CKD) affects nearly 10% of the global population and often progresses to end-stage renal failure.<n>We present a transformer-based framework for predicting CKD progression using multi-modal electronic health records.
arXiv Detail & Related papers (2025-11-04T07:52:17Z)
A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer [54.58205672910646]
RenalCLIP is a visual-language foundation model for characterization, diagnosis and prognosis of renal mass.<n>It achieved better performance and superior generalizability across 10 core tasks spanning the full clinical workflow of kidney cancer.
arXiv Detail & Related papers (2025-08-22T17:48:19Z)
A Novel Attention-Augmented Wavelet YOLO System for Real-time Brain Vessel Segmentation on Transcranial Color-coded Doppler [49.03919553747297]
We propose an AI-powered, real-time CoW auto-segmentation system capable of efficiently capturing cerebral arteries.<n>No prior studies have explored AI-driven cerebrovascular segmentation using Transcranial Color-coded Doppler (TCCD)<n>The proposed AAW-YOLO demonstrated strong performance in segmenting both ipsilateral and contralateral CoW vessels.
arXiv Detail & Related papers (2025-08-19T14:41:22Z)
A SHAP-based explainable multi-level stacking ensemble learning method for predicting the length of stay in acute stroke [3.2906073576204955]
Existing machine learning models have shown suboptimal predictive performance, limited generalisability, and have overlooked system-level factors.<n>We developed an interpretable multi-level stacking ensemble model for ischaemic and haemorrhagic stroke.<n>An explainable ensemble model effectively predicted the prolonged LOS in ischaemic stroke.<n>Further validation is needed for haemorrhagic stroke.
arXiv Detail & Related papers (2025-05-30T01:08:26Z)
Predicting Length of Stay in Neurological ICU Patients Using Classical Machine Learning and Neural Network Models: A Benchmark Study on MIMIC-IV [49.1574468325115]
This study explores multiple ML approaches for predicting LOS in ICU specifically for the patients with neurological diseases based on the MIMIC-IV dataset.<n>The evaluated models include classic ML algorithms (K-Nearest Neighbors, Random Forest, XGBoost and CatBoost) and Neural Networks (LSTM, BERT and Temporal Fusion Transformer)
arXiv Detail & Related papers (2025-05-23T14:06:42Z)
A Transformer-based survival model for prediction of all-cause mortality in heart failure patients: a multi-cohort study [5.831730826863567]
We developed and validated TRisk, a Transformer-based AI model predicting 36-month mortality in heart failure patients.<n>Our study included 403,534 heart failure patients (ages 40-90) from 1,418 English general practices.
arXiv Detail & Related papers (2025-03-16T01:53:50Z)
Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z)
An Interpretable Web-based Glioblastoma Multiforme Prognosis Prediction Tool using Random Forest Model [1.1024591739346292]
We propose predictive models that estimate GBM patients' health status of one-year after treatments. We used total of 467 GBM patients' clinical profile consists of 13 features and two follow-up dates. Our machine learning models suggest that the top three prognostic factors for GBM patient survival were MGMT gene promoter, the extent of resection, and age.
arXiv Detail & Related papers (2021-08-30T07:56:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.