An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models
- URL: http://arxiv.org/abs/2602.20324v1
- Date: Mon, 23 Feb 2026 20:20:23 GMT
- Title: An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models
- Authors: Cathy Shyr, Yan Hu, Rory J. Tinker, Thomas A. Cassini, Kevin W. Byram, Rizwan Hamid, Daniel V. Fabbri, Adam Wright, Josh F. Peterson, Lisa Bastarache, Hua Xu,
- Abstract summary: RARE-PHENIX is an end-to-end AI framework for rare disease phenotyping.<n>It integrates large language model-based phenotype extraction, standardization to Human Phenotype Ontology terms, and supervised ranking of diagnostically informative phenotypes.<n>It consistently outperformed a state-of-the-art deep learning baseline (PhenoBERT) across ontology-based similarity and precision-recall-F1 metrics in end-to-end evaluation.
- Score: 19.80670818473776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Phenotyping is fundamental to rare disease diagnosis, but manual curation of structured phenotypes from clinical notes is labor-intensive and difficult to scale. Existing artificial intelligence approaches typically optimize individual components of phenotyping but do not operationalize the full clinical workflow of extracting features from clinical text, standardizing them to Human Phenotype Ontology (HPO) terms, and prioritizing diagnostically informative HPO terms. We developed RARE-PHENIX, an end-to-end AI framework for rare disease phenotyping that integrates large language model-based phenotype extraction, ontology-grounded standardization to HPO terms, and supervised ranking of diagnostically informative phenotypes. We trained RARE-PHENIX using data from 2,671 patients across 11 Undiagnosed Diseases Network clinical sites, and externally validated it on 16,357 real-world clinical notes from Vanderbilt University Medical Center. Using clinician-curated HPO terms as the gold standard, RARE-PHENIX consistently outperformed a state-of-the-art deep learning baseline (PhenoBERT) across ontology-based similarity and precision-recall-F1 metrics in end-to-end evaluation (i.e., ontology-based similarity of 0.70 vs. 0.58). Ablation analyses demonstrated performance improvements with the addition of each module in RARE-PHENIX (extraction, standardization, and prioritization), supporting the value of modeling the full clinical phenotyping workflow. By modeling phenotyping as a clinically aligned workflow rather than a single extraction task, RARE-PHENIX provides structured, ranked phenotypes that are more concordant with clinician curation and has the potential to support human-in-the-loop rare disease diagnosis in real-world settings.
Related papers
- PhenoLIP: Integrating Phenotype Ontology Knowledge into Medical Vision-Language Pretraining [71.60950593762719]
PhenoLIP is a novel pretraining framework that incorporates structured phenotype knowledge into medical image understanding.<n> PhenoLIP outperforms previous state-of-the-art approaches for medical image understanding.
arXiv Detail & Related papers (2026-02-05T20:44:07Z) - A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer [54.58205672910646]
RenalCLIP is a visual-language foundation model for characterization, diagnosis and prognosis of renal mass.<n>It achieved better performance and superior generalizability across 10 core tasks spanning the full clinical workflow of kidney cancer.
arXiv Detail & Related papers (2025-08-22T17:48:19Z) - Counterfactual Probabilistic Diffusion with Expert Models [44.96279296893773]
We propose a time series diffusion-based framework that incorporates guidance from imperfect expert models.<n>Our method, ODE-Diff, bridges mechanistic and data-driven approaches, enabling more reliable and interpretable causal inference.
arXiv Detail & Related papers (2025-08-18T20:44:32Z) - Prototype Learning to Create Refined Interpretable Digital Phenotypes from ECGs [0.6488018816675728]
Prototype-based neural networks offer interpretable predictions by comparing inputs to learned, representative signal patterns anchored in training data.<n>We use a prototype-based deep learning model trained for multi-label ECG classification using the PTB-XL dataset.<n>We assess whether individual prototypes, trained solely for classification, are associated with hospital discharge diagnoses in the form of phecodes.
arXiv Detail & Related papers (2025-08-02T23:52:08Z) - Self-Explaining Hypergraph Neural Networks for Diagnosis Prediction [45.89562183034469]
Existing deep learning diagnosis prediction models with intrinsic interpretability often assign attention weights to every past diagnosis or hospital visit.<n>We introduce SHy, a self-explaining hypergraph neural network model, designed to offer personalized, concise and faithful explanations.<n> SHy captures higher-order disease interactions and extracts distinct temporal phenotypes as personalized explanations.
arXiv Detail & Related papers (2025-02-15T06:33:02Z) - Clustering of Disease Trajectories with Explainable Machine Learning: A Case Study on Postoperative Delirium Phenotypes [13.135589459700865]
We propose an approach that combines supervised machine learning for personalized POD risk prediction with unsupervised clustering techniques to uncover potential POD phenotypes.
We show that clustering patients in the SHAP feature importance space successfully recovers the true underlying phenotypes, outperforming clustering in the raw feature space.
arXiv Detail & Related papers (2024-05-06T10:05:46Z) - An evaluation of GPT models for phenotype concept recognition [0.4715973318447338]
We examine the performance of the latest Generative Pre-trained Transformer (GPT) models for clinical phenotyping and phenotype annotation.
Our results show that, with an appropriate setup, these models can achieve state of the art performance.
arXiv Detail & Related papers (2023-09-29T12:06:55Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - sEHR-CE: Language modelling of structured EHR data for efficient and
generalizable patient cohort expansion [0.0]
sEHR-CE is a novel framework based on transformers to enable integrated phenotyping and analyses of heterogeneous clinical datasets.
We validate our approach using primary and secondary care data from the UK Biobank, a large-scale research study.
arXiv Detail & Related papers (2022-11-30T16:00:43Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - TrialGraph: Machine Intelligence Enabled Insight from Graph Modelling of
Clinical Trials [0.0]
We introduce a curated clinical trial data set compiled from the CT.gov, AACT and TrialTrove databases (n=1191 trials; representing one million patients)
We then detail the mathematical basis and implementation of a selection of graph machine learning algorithms.
We trained these models to predict side effect information for a clinical trial given information on the disease, existing medical conditions, and treatment.
arXiv Detail & Related papers (2021-12-15T15:36:57Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.