Retrieving and Refining: A Hybrid Framework with Large Language Models for Rare Disease Identification
- URL: http://arxiv.org/abs/2405.10440v1
- Date: Thu, 16 May 2024 20:59:28 GMT
- Title: Retrieving and Refining: A Hybrid Framework with Large Language Models for Rare Disease Identification
- Authors: Jinge Wu, Hang Dong, Zexi Li, Arijit Patra, Honghan Wu,
- Abstract summary: This study proposes a novel hybrid approach that combines a traditional dictionary-based natural language processing (NLP) tool with the powerful capabilities of large language models (LLMs)
We evaluate various prompting strategies on six large language models (LLMs) of varying sizes and domains (general and medical)
- Score: 4.215595156143688
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The infrequency and heterogeneity of clinical presentations in rare diseases often lead to underdiagnosis and their exclusion from structured datasets. This necessitates the utilization of unstructured text data for comprehensive analysis. However, the manual identification from clinical reports is an arduous and intrinsically subjective task. This study proposes a novel hybrid approach that synergistically combines a traditional dictionary-based natural language processing (NLP) tool with the powerful capabilities of large language models (LLMs) to enhance the identification of rare diseases from unstructured clinical notes. We comprehensively evaluate various prompting strategies on six large language models (LLMs) of varying sizes and domains (general and medical). This evaluation encompasses zero-shot, few-shot, and retrieval-augmented generation (RAG) techniques to enhance the LLMs' ability to reason about and understand contextual information in patient reports. The results demonstrate effectiveness in rare disease identification, highlighting the potential for identifying underdiagnosed patients from clinical notes.
Related papers
- A Weakly Supervised Transformer to Support Rare Disease Diagnosis from Electronic Health Records: Methods and Applications in Rare Pulmonary Disease [16.112294460618955]
Rare diseases affect an estimated 300-400 million people worldwide.<n> computational phenotyping algorithms show promise for rare disease detection.<n>We propose a weakly supervised, transformer-based framework that combines a small set of gold-standard labels with a large volume of iteratively updated silver-standard labels.
arXiv Detail & Related papers (2025-07-01T23:11:20Z) - Decoding Rarity: Large Language Models in the Diagnosis of Rare Diseases [1.9662978733004604]
Large language models (LLMs) have shown promising capabilities in transforming rare disease research.<n>This paper explores the integration of LLMs in the analysis of rare diseases, highlighting significant strides and pivotal studies.
arXiv Detail & Related papers (2025-05-18T15:42:15Z) - Towards Scalable and Cross-Lingual Specialist Language Models for Oncology [4.824906329042275]
General-purpose large models (LLMs) struggle with challenges such as clinical terminology, context-dependent interpretations, and multi-modal data integration.
We develop an oncology-specialized, efficient, and adaptable NLP framework that combines instruction tuning, retrieval-augmented generation (RAG), and graph-based knowledge integration.
arXiv Detail & Related papers (2025-03-11T11:34:57Z) - A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis [58.85247337449624]
We propose a knowledge-enhanced vision-language pre-training approach that integrates disease knowledge into the alignment within hierarchical semantic groups.
KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks.
arXiv Detail & Related papers (2024-12-17T17:45:21Z) - Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases.
We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models.
Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z) - MMIL: A novel algorithm for disease associated cell type discovery [58.044870442206914]
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease.
We introduce Mixture Modeling for Multiple Learning Instance (MMIL), an expectation method that enables the training and calibration of cell-level classifiers.
arXiv Detail & Related papers (2024-06-12T15:22:56Z) - AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models [25.966454809890227]
Rare diseases affect millions worldwide but often face limited research focus due to their low prevalence.
Recent advancements in Large Language Models (LLMs) have shown promise in automating the extraction of medical information.
We propose an end-to-end system called AutoRD, which automates the extraction of information from medical texts about rare diseases.
arXiv Detail & Related papers (2024-03-01T20:06:39Z) - Multimodal Neurodegenerative Disease Subtyping Explained by ChatGPT [15.942849233189664]
Alzheimer's disease is the most prevalent neurodegenerative disease.
Current data driven approaches are able to classify the subtypes at later stages of AD or related disorders, but struggle when predicting at the asymptomatic or prodromal stage.
We propose a multimodal framework that uses early-stage indicators such as imaging, genetics and clinical assessments to classify AD patients into subtypes at early stages.
arXiv Detail & Related papers (2024-01-31T19:30:04Z) - Large Language Models with Retrieval-Augmented Generation for Zero-Shot
Disease Phenotyping [1.8630636381951384]
Large language models (LLMs) offer promise in text understanding but may not efficiently handle real-world clinical documentation.
We propose a zero-shot LLM-based method enriched by retrieval-augmented generation and MapReduce.
We show that this method as applied to pulmonary hypertension (PH), a rare disease characterized by elevated arterial pressures in the lungs, significantly outperforms physician logic rules.
arXiv Detail & Related papers (2023-12-11T15:45:27Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Unsupervised Representation Learning Meets Pseudo-Label Supervised
Self-Distillation: A New Approach to Rare Disease Classification [26.864435224276964]
We propose a novel hybrid approach to rare disease classification, featuring two key novelties.
First, we adopt the unsupervised representation learning (URL) based on self-supervising contrastive loss.
Second, we integrate the URL with pseudo-label supervised classification for effective self-distillation of the knowledge about the rare diseases.
arXiv Detail & Related papers (2021-10-09T12:56:09Z) - Rare Disease Identification from Clinical Notes with Ontologies and Weak
Supervision [3.6471045233540806]
We show that the Text-to-UMLS process can be greatly improved with weak supervision, without any annotated data from domain experts.
Our analysis shows that the overall pipeline processing discharge summaries can surface cases, which are mostly uncaptured in manual ICD codes of the hospital admissions.
arXiv Detail & Related papers (2021-05-05T11:49:09Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z) - Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue
Generation [150.52617238140868]
We propose low-resource medical dialogue generation to transfer the diagnostic experience from source diseases to target ones.
We also develop a Graph-Evolving Meta-Learning framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease.
arXiv Detail & Related papers (2020-12-22T13:20:23Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.