A Hybrid Framework with Large Language Models for Rare Disease Phenotyping
- URL: http://arxiv.org/abs/2405.10440v2
- Date: Tue, 08 Oct 2024 14:32:39 GMT
- Title: A Hybrid Framework with Large Language Models for Rare Disease Phenotyping
- Authors: Jinge Wu, Hang Dong, Zexi Li, Haowei Wang, Runci Li, Arijit Patra, Chengliang Dai, Waqar Ali, Phil Scordis, Honghan Wu,
- Abstract summary: Rare diseases pose significant challenges in diagnosis and treatment due to their low prevalence and heterogeneous clinical presentations.
This study aims to develop a hybrid approach combining dictionary-based natural language processing (NLP) tools with large language models (LLMs)
We propose a novel hybrid framework that integrates the Orphanet Rare Disease Ontology (ORDO) and the Unified Medical Language System (UMLS) to create a comprehensive rare disease vocabulary.
- Score: 4.550497164299771
- License:
- Abstract: Rare diseases pose significant challenges in diagnosis and treatment due to their low prevalence and heterogeneous clinical presentations. Unstructured clinical notes contain valuable information for identifying rare diseases, but manual curation is time-consuming and prone to subjectivity. This study aims to develop a hybrid approach combining dictionary-based natural language processing (NLP) tools with large language models (LLMs) to improve rare disease identification from unstructured clinical reports. We propose a novel hybrid framework that integrates the Orphanet Rare Disease Ontology (ORDO) and the Unified Medical Language System (UMLS) to create a comprehensive rare disease vocabulary. The proposed hybrid approach demonstrates superior performance compared to traditional NLP systems and standalone LLMs. Notably, the approach uncovers a significant number of potential rare disease cases not documented in structured diagnostic records, highlighting its ability to identify previously unrecognized patients.
Related papers
- Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases.
We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models.
Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z) - MMIL: A novel algorithm for disease associated cell type discovery [58.044870442206914]
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease.
We introduce Mixture Modeling for Multiple Learning Instance (MMIL), an expectation method that enables the training and calibration of cell-level classifiers.
arXiv Detail & Related papers (2024-06-12T15:22:56Z) - AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models [25.966454809890227]
Rare diseases affect millions worldwide but often face limited research focus due to their low prevalence.
Recent advancements in Large Language Models (LLMs) have shown promise in automating the extraction of medical information.
We propose an end-to-end system called AutoRD, which automates the extraction of information from medical texts about rare diseases.
arXiv Detail & Related papers (2024-03-01T20:06:39Z) - Multimodal Neurodegenerative Disease Subtyping Explained by ChatGPT [15.942849233189664]
Alzheimer's disease is the most prevalent neurodegenerative disease.
Current data driven approaches are able to classify the subtypes at later stages of AD or related disorders, but struggle when predicting at the asymptomatic or prodromal stage.
We propose a multimodal framework that uses early-stage indicators such as imaging, genetics and clinical assessments to classify AD patients into subtypes at early stages.
arXiv Detail & Related papers (2024-01-31T19:30:04Z) - Large Language Models with Retrieval-Augmented Generation for Zero-Shot
Disease Phenotyping [1.8630636381951384]
Large language models (LLMs) offer promise in text understanding but may not efficiently handle real-world clinical documentation.
We propose a zero-shot LLM-based method enriched by retrieval-augmented generation and MapReduce.
We show that this method as applied to pulmonary hypertension (PH), a rare disease characterized by elevated arterial pressures in the lungs, significantly outperforms physician logic rules.
arXiv Detail & Related papers (2023-12-11T15:45:27Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Unsupervised Representation Learning Meets Pseudo-Label Supervised
Self-Distillation: A New Approach to Rare Disease Classification [26.864435224276964]
We propose a novel hybrid approach to rare disease classification, featuring two key novelties.
First, we adopt the unsupervised representation learning (URL) based on self-supervising contrastive loss.
Second, we integrate the URL with pseudo-label supervised classification for effective self-distillation of the knowledge about the rare diseases.
arXiv Detail & Related papers (2021-10-09T12:56:09Z) - Rare Disease Identification from Clinical Notes with Ontologies and Weak
Supervision [3.6471045233540806]
We show that the Text-to-UMLS process can be greatly improved with weak supervision, without any annotated data from domain experts.
Our analysis shows that the overall pipeline processing discharge summaries can surface cases, which are mostly uncaptured in manual ICD codes of the hospital admissions.
arXiv Detail & Related papers (2021-05-05T11:49:09Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z) - Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue
Generation [150.52617238140868]
We propose low-resource medical dialogue generation to transfer the diagnostic experience from source diseases to target ones.
We also develop a Graph-Evolving Meta-Learning framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease.
arXiv Detail & Related papers (2020-12-22T13:20:23Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.