SigBERT: Combining Narrative Medical Reports and Rough Path Signature Theory for Survival Risk Estimation in Oncology
- URL: http://arxiv.org/abs/2507.22941v1
- Date: Fri, 25 Jul 2025 12:33:25 GMT
- Title: SigBERT: Combining Narrative Medical Reports and Rough Path Signature Theory for Survival Risk Estimation in Oncology
- Authors: Paul Minchella, Loïc Verlingue, Stéphane Chrétien, Rémi Vaucher, Guillaume Metzler,
- Abstract summary: SigBERT is an innovative temporal survival analysis framework designed to process a large number of clinical reports per patient.<n>It processes timestamped medical reports by extracting and averaging word embeddings into sentence embeddings.<n>It was trained and evaluated on a real-world oncology dataset from the L'eon B'erard Center corpus, with a C-index score of 0.75 (sd 0.014) on the independent test cohort.
- Score: 1.5425688173297465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Electronic medical reports (EHR) contain a vast amount of information that can be leveraged for machine learning applications in healthcare. However, existing survival analysis methods often struggle to effectively handle the complexity of textual data, particularly in its sequential form. Here, we propose SigBERT, an innovative temporal survival analysis framework designed to efficiently process a large number of clinical reports per patient. SigBERT processes timestamped medical reports by extracting and averaging word embeddings into sentence embeddings. To capture temporal dynamics from the time series of sentence embedding coordinates, we apply signature extraction from rough path theory to derive geometric features for each patient, which significantly enhance survival model performance by capturing complex temporal dynamics. These features are then integrated into a LASSO-penalized Cox model to estimate patient-specific risk scores. The model was trained and evaluated on a real-world oncology dataset from the L\'eon B\'erard Center corpus, with a C-index score of 0.75 (sd 0.014) on the independent test cohort. SigBERT integrates sequential medical data to enhance risk estimation, advancing narrative-based survival analysis.
Related papers
- Deep Survival Analysis in Multimodal Medical Data: A Parametric and Probabilistic Approach with Competing Risks [47.19194118883552]
We introduce a multimodal deep learning framework for survival analysis capable of modeling both single and competing risks scenarios.<n>We propose SAMVAE (Survival Analysis Multimodal Variational Autoencoder), a novel deep learning architecture designed for survival prediction.
arXiv Detail & Related papers (2025-07-10T14:29:48Z) - Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z) - Survival Prediction in Lung Cancer through Multi-Modal Representation Learning [9.403446155541346]
This paper presents a novel approach to survival prediction by harnessing comprehensive information from CT and PET scans, along with associated Genomic data.
We aim to develop a robust predictive model for survival outcomes by integrating multi-modal imaging data with genetic information.
arXiv Detail & Related papers (2024-09-30T10:42:20Z) - Deep State-Space Generative Model For Correlated Time-to-Event Predictions [54.3637600983898]
We propose a deep latent state-space generative model to capture the interactions among different types of correlated clinical events.
Our method also uncovers meaningful insights about the latent correlations among mortality and different types of organ failures.
arXiv Detail & Related papers (2024-07-28T02:42:36Z) - Advancing Head and Neck Cancer Survival Prediction via Multi-Label Learning and Deep Model Interpretation [7.698783025721071]
We propose IMLSP, an Interpretable Multi-Label multi-modal deep Survival Prediction framework for predicting multiple HNC survival outcomes simultaneously.
We also present Grad-TEAM, a Gradient-weighted Time-Event Activation Mapping approach specifically developed for deep survival model visual explanation.
arXiv Detail & Related papers (2024-05-09T01:30:04Z) - Estimating the severity of dental and oral problems via sentiment
classification over clinical reports [0.8287206589886879]
Analyzing authors' sentiments in texts can be practical and useful in various fields, including medicine and dentistry.
Deep learning model based on Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network architecture, known as CNN-LSTM, was developed to detect severity level of patient's problem.
arXiv Detail & Related papers (2024-01-17T14:33:13Z) - SAVAE: Leveraging the variational Bayes autoencoder for survival
analysis [10.0060346233449]
We introduce SAVAE (Survival Analysis Variational Autoencoder), a novel approach based on Variational Autoencoders.
Savoe contributes significantly to the field by introducing a tailored ELBO formulation for survival analysis.
It offers a general method that consistently performs well on various metrics, demonstrating robustness and stability through different experiments.
arXiv Detail & Related papers (2023-12-22T12:36:50Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - SurvTimeSurvival: Survival Analysis On The Patient With Multiple
Visits/Records [26.66492761632773]
The accurate prediction of survival times for patients with severe diseases remains a critical challenge despite recent advances in artificial intelligence.
This study introduces "SurvTimeSurvival: Survival Analysis On Patients With Multiple Visits/Records"
arXiv Detail & Related papers (2023-11-16T12:30:14Z) - MIA-Prognosis: A Deep Learning Framework to Predict Therapy Response [58.0291320452122]
This paper aims at a unified deep learning approach to predict patient prognosis and therapy response.
We formalize the prognosis modeling as a multi-modal asynchronous time series classification task.
Our predictive model could further stratify low-risk and high-risk patients in terms of long-term survival.
arXiv Detail & Related papers (2020-10-08T15:30:17Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.