LA-MARRVEL: A Knowledge-Grounded and Language-Aware LLM Reranker for AI-MARRVEL in Rare Disease Diagnosis
- URL: http://arxiv.org/abs/2511.02263v3
- Date: Thu, 06 Nov 2025 03:00:21 GMT
- Title: LA-MARRVEL: A Knowledge-Grounded and Language-Aware LLM Reranker for AI-MARRVEL in Rare Disease Diagnosis
- Authors: Jaeyeon Lee, Hyun-Hwan Jeong, Zhandong Liu,
- Abstract summary: Large language models (LLMs) can read such text, but clinical use needs grounding in citable knowledge and stable, repeatable behavior.<n>LA-MARRVEL has three parts: expert-engineered context that enriches phenotype and disease information; a ranked voting algorithm that combines multiple LLM runs to choose a consensus ranked gene list; and the AI-MARRVEL pipeline that provides first-stage ranks and gene annotations.
- Score: 6.8308581520283225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diagnosing rare diseases requires linking gene findings with often unstructured reference text. Current pipelines collect many candidate genes, but clinicians still spend a lot of time filtering false positives and combining evidence from papers and databases. A key challenge is language: phenotype descriptions and inheritance patterns are written in prose, not fully captured by tables. Large language models (LLMs) can read such text, but clinical use needs grounding in citable knowledge and stable, repeatable behavior. We explore a knowledge-grounded and language-aware reranking layer on top of a high-recall first-stage pipeline. The goal is to improve precision and explainability, not to replace standard bioinformatics steps. We use expert-built context and a consensus method to reduce LLM variability, producing shorter, better-justified gene lists for expert review. LA-MARRVEL achieves the highest accuracy, outperforming other methods -- including traditional bioinformatics diagnostic tools (AI-MARRVEL, Exomiser, LIRICAL) and naive large language models (e.g., Anthropic Claude) -- with an average Recall@5 of 94.10%, a +3.65 percentage-point improvement over AI-MARRVEL. The LLM-generated reasoning provides clear prose on phenotype matching and inheritance patterns, making clinical review faster and easier. LA-MARRVEL has three parts: expert-engineered context that enriches phenotype and disease information; a ranked voting algorithm that combines multiple LLM runs to choose a consensus ranked gene list; and the AI-MARRVEL pipeline that provides first-stage ranks and gene annotations, already known as a state-of-the-art method in Rare Disease Diagnosis on BG, DDD, and UDN cohorts. The online AI-MARRVEL includes LA-MARRVEL as an LLM feature at https://ai.marrvel.org . We evaluate LA-MARRVEL on three datasets from independent cohorts of real-world diagnosed patients.
Related papers
- MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI [66.0701326117134]
MedForget is a hierarchy-aware multimodal unlearning testbed for building compliant medical AI systems.<n>We show that existing methods struggle to achieve complete, hierarchy-aware forgetting without reducing diagnostic performance.<n>We introduce a reconstruction attack that progressively adds hierarchical level context to prompts.
arXiv Detail & Related papers (2025-12-10T17:55:06Z) - Enhancing the Medical Context-Awareness Ability of LLMs via Multifaceted Self-Refinement Learning [49.559151128219725]
Large language models (LLMs) have shown great promise in the medical domain, achieving strong performance on several benchmarks.<n>However, they continue to underperform in real-world medical scenarios, which often demand stronger context-awareness.<n>We propose Multifaceted Self-Refinement (MuSeR), a data-driven approach that enhances LLMs' context-awareness along three key facets.
arXiv Detail & Related papers (2025-11-13T08:13:23Z) - DiagnoLLM: A Hybrid Bayesian Neural Language Framework for Interpretable Disease Diagnosis [9.694872671659484]
We present textttDiagnoLLM, a hybrid framework that integrates Bayesian deconvolution, eQTL-guided deep learning, and LLM-based narrative generation for interpretable disease diagnosis.<n>Our findings show that LLMs, when deployed as post-hoc reasoners rather than end-to-end predictors, can serve as effective communicators within hybrid diagnostic pipelines.
arXiv Detail & Related papers (2025-11-08T02:51:21Z) - Knowledge Elicitation with Large Language Models for Interpretable Cancer Stage Identification from Pathology Reports [2.5829043503611318]
We introduce two Knowledge Elicitation methods designed to overcome limitations by enabling large language models to induce and apply domain-specific rules for cancer staging.<n>The first, Knowledge Elicitation with Long-Term Memory (KEwLTM), uses an iterative prompting strategy to derive staging rules directly from unannotated pathology reports.<n>The second, Knowledge Elicitation with Retrieval-Augmented Generation (KEwRAG), employs a variation of RAG where rules are pre-extracted from relevant guidelines in a single step and then applied, enhancing interpretability and avoiding repeated retrieval overhead.
arXiv Detail & Related papers (2025-11-02T19:00:40Z) - Revealing Interconnections between Diseases: from Statistical Methods to Large Language Models [0.15558822250482188]
Identifying disease interconnections through manual analysis of large-scale clinical data is labor-intensive, subjective, and prone to expert disagreement.<n>We evaluate seven approaches for uncovering disease relationships based on two data sources: sequences of ICD-10 codes from MIMIC-IV EHRs and the full set of ICD-10 codes.<n>Our framework integrates the following: (i) a statistical co-occurrence analysis and a masked language modeling (MLM) approach using real clinical data; (ii) domain-specific BERT variants; and (iii) a general-purpose BERT and document retrieval.
arXiv Detail & Related papers (2025-10-06T15:09:39Z) - HeteroRAG: A Heterogeneous Retrieval-Augmented Generation Framework for Medical Vision Language Tasks [22.597677744620295]
We present HeteroRAG, a novel framework that enhances Med-LVLMs through heterogeneous knowledge sources.<n>HeteroRAG achieves state-of-the-art performance in most medical vision language benchmarks.
arXiv Detail & Related papers (2025-08-18T09:54:10Z) - Leaps Beyond the Seen: Reinforced Reasoning Augmented Generation for Clinical Notes [10.897880916802864]
ReinRAG is a reasoning augmented generation (RAG) for long-form discharge instructions based on pre-admission information.<n>To bridge the information gap, we propose group-based retriever optimization (GRO) which improves retrieval quality with group-normalized rewards.<n>Experiments on the real-world dataset show that ReinRAG outperforms baselines in both clinical efficacy and natural language generation metrics.
arXiv Detail & Related papers (2025-06-03T12:59:52Z) - A Multimodal Multi-Agent Framework for Radiology Report Generation [2.1477122604204433]
Radiology report generation (RRG) aims to automatically produce diagnostic reports from medical images.<n>We propose a multimodal multi-agent framework for RRG that aligns with the stepwise clinical reasoning workflow.
arXiv Detail & Related papers (2025-05-14T20:28:04Z) - TheBlueScrubs-v1, a comprehensive curated medical dataset derived from the internet [1.4043931310479378]
TheBlueScrubs-v1 is a curated dataset of over 25 billion medical tokens drawn from a broad-scale internet corpus.<n>Each text is assigned three LLM-based quality scores encompassing medical relevance, precision and factual detail, and safety and ethical standards.<n>This Data Descriptor details the dataset's creation and validation, underscoring its potential utility for medical AI research.
arXiv Detail & Related papers (2025-04-01T22:25:19Z) - Survey and Improvement Strategies for Gene Prioritization with Large Language Models [61.24568051916653]
Large language models (LLMs) have performed well in medical exams, but their effectiveness in diagnosing rare genetic diseases has not been assessed.<n>We used multi-agent and Human Phenotype Ontology (HPO) classification to categorized patients based on phenotypes and solvability levels.<n>At baseline, GPT-4 outperformed other LLMs, achieving near 30% accuracy in ranking causal genes correctly.
arXiv Detail & Related papers (2025-01-30T23:03:03Z) - MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.<n>Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.<n>We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z) - KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models [39.831976458410864]
This paper presents KARGEN, a Knowledge-enhanced Automated radiology Report GENeration framework based on Large Language Models.
The framework integrates a knowledge graph to unlock chest disease-related knowledge within the LLM to enhance the clinical utility of generated reports.
Our approach demonstrates promising results on the MIMIC-CXR and IU-Xray datasets.
arXiv Detail & Related papers (2024-09-09T06:57:22Z) - Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases.
We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models.
Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z) - Large Language Model Distilling Medication Recommendation Model [58.94186280631342]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)<n>Our research aims to transform existing medication recommendation methodologies using LLMs.<n>To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - Cross-Modal Causal Intervention for Medical Report Generation [107.76649943399168]
Radiology Report Generation (RRG) is essential for computer-aided diagnosis and medication guidance.<n> generating accurate lesion descriptions remains challenging due to spurious correlations from visual-linguistic biases.<n>We propose a two-stage framework named CrossModal Causal Representation Learning (CMCRL)<n> Experiments on IU-Xray and MIMIC-CXR show that our CMCRL pipeline significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - Ontology-Driven and Weakly Supervised Rare Disease Identification from
Clinical Notes [13.096008602034086]
Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts.
We propose a method using brain and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT)
The weakly supervised approach is proposed to learn a confirmation phenotype model to improve Text-to-UMLS linking, without annotated data from domain experts.
arXiv Detail & Related papers (2022-05-11T17:38:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.