Accelerating Causal Network Discovery of Alzheimer Disease Biomarkers via Scientific Literature-based Retrieval Augmented Generation
- URL: http://arxiv.org/abs/2504.08768v1
- Date: Tue, 01 Apr 2025 22:15:17 GMT
- Title: Accelerating Causal Network Discovery of Alzheimer Disease Biomarkers via Scientific Literature-based Retrieval Augmented Generation
- Authors: Xiaofan Zhou, Liangjie Huang, Pinyang Cheng, Wenpen Yin, Rui Zhang, Wenrui Hao, Lu Cheng,
- Abstract summary: Alzheimer's disease (AD) diagnosis enables early detection, precise disease staging, targeted treatments, and improved monitoring of disease progression.<n>Understanding these causal relationships is complex and requires extensive research.<n>Can advanced large language models (LLMs), such as those utilizing retrieval-augmented generation (RAG), assist in building causal networks of biomarkers for further medical analysis?<n>We collected 200 AD-related research papers published over the past 25 years and then integrated scientific literature with RAG to extract AD biomarkers and generate causal relations among them.
- Score: 6.772825080162557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The causal relationships between biomarkers are essential for disease diagnosis and medical treatment planning. One notable application is Alzheimer's disease (AD) diagnosis, where certain biomarkers may influence the presence of others, enabling early detection, precise disease staging, targeted treatments, and improved monitoring of disease progression. However, understanding these causal relationships is complex and requires extensive research. Constructing a comprehensive causal network of biomarkers demands significant effort from human experts, who must analyze a vast number of research papers, and have bias in understanding diseases' biomarkers and their relation. This raises an important question: Can advanced large language models (LLMs), such as those utilizing retrieval-augmented generation (RAG), assist in building causal networks of biomarkers for further medical analysis? To explore this, we collected 200 AD-related research papers published over the past 25 years and then integrated scientific literature with RAG to extract AD biomarkers and generate causal relations among them. Given the high-risk nature of the medical diagnosis, we applied uncertainty estimation to assess the reliability of the generated causal edges and examined the faithfulness and scientificness of LLM reasoning using both automatic and human evaluation. We find that RAG enhances the ability of LLMs to generate more accurate causal networks from scientific papers. However, the overall performance of LLMs in identifying causal relations of AD biomarkers is still limited. We hope this study will inspire further foundational research on AI-driven analysis of AD biomarkers causal network discovery.
Related papers
- Survey and Improvement Strategies for Gene Prioritization with Large Language Models [61.24568051916653]
Large language models (LLMs) have performed well in medical exams, but their effectiveness in diagnosing rare genetic diseases has not been assessed.<n>We used multi-agent and Human Phenotype Ontology (HPO) classification to categorized patients based on phenotypes and solvability levels.<n>At baseline, GPT-4 outperformed other LLMs, achieving near 30% accuracy in ranking causal genes correctly.
arXiv Detail & Related papers (2025-01-30T23:03:03Z) - Graph-Based Biomarker Discovery and Interpretation for Alzheimer's Disease [1.859931123372708]
Early diagnosis and discovery of therapeutic drug targets are crucial objectives for the effective management of Alzheimer's Disease (AD)<n>Recent blood tests have shown promise in diagnosing AD and highlighting possible biomarkers that can be used as drug targets for AD management.<n>Here, we introduce BRAIN, a novel machine learning framework to jointly optimize the diagnostic accuracy and biomarker discovery processes.
arXiv Detail & Related papers (2024-11-27T22:45:19Z) - Leveraging Social Determinants of Health in Alzheimer's Research Using LLM-Augmented Literature Mining and Knowledge Graphs [33.755845172595365]
Growing evidence suggests that social determinants of health (SDoH) affect individuals' risks of developing Alzheimer's disease (AD) and related dementias.<n>This study presents a novel, automated framework to mine SDoH knowledge from extensive literature and integrate it with AD-related biological entities.<n>Our framework shows promise for enhancing knowledge discovery in AD and can be generalized to other SDoH-related research areas.
arXiv Detail & Related papers (2024-10-04T21:39:30Z) - Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models [46.05020842978823]
Large Language Models (LLMs) have emerged as powerful tools to navigate this complex data landscape.
RAGGED is a comprehensive workflow designed to support investigators with knowledge integration and hypothesis generation.
arXiv Detail & Related papers (2024-07-17T07:44:18Z) - A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis [51.07114445705692]
neurodegenerative diseases (NDs) traditionally require extensive healthcare resources and human effort for medical diagnosis and monitoring.<n>As a crucial disease-related motor symptom, human gait can be exploited to characterize different NDs.<n>The current advances in artificial intelligence (AI) models enable automatic gait analysis for NDs identification and classification.
arXiv Detail & Related papers (2024-05-21T06:44:40Z) - Discovering robust biomarkers of psychiatric disorders from resting-state functional MRI via graph neural networks: A systematic review [4.799269666410891]
We review how GNN and model explainability techniques have been applied to fMRI datasets for disorder prediction tasks.
We identify 65 studies using GNNs that reported potential fMRI biomarkers for psychiatric disorders.
We suggest establishing new standards that are based on objective evaluation metrics to determine the robustness of potential biomarkers.
arXiv Detail & Related papers (2024-05-01T15:29:55Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - Progress and Opportunities of Foundation Models in Bioinformatics [77.74411726471439]
Foundations models (FMs) have ushered in a new era in computational biology, especially in the realm of deep learning.
Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs.
Review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases.
arXiv Detail & Related papers (2024-02-06T02:29:17Z) - A Review on Knowledge Graphs for Healthcare: Resources, Applications, and Promises [59.4999994297993]
This comprehensive review aims to provide an overview of the current state of Healthcare Knowledge Graphs (HKGs)<n>We thoroughly analyzed existing literature on HKGs, covering their construction methodologies, utilization techniques, and applications.<n>The review highlights the potential of HKGs to significantly impact biomedical research and clinical practice.
arXiv Detail & Related papers (2023-06-07T21:51:56Z) - Understanding Breast Cancer Survival: Using Causality and Language
Models on Multi-omics Data [23.850817918011863]
We exploit causal discovery algorithms to investigate how perturbations in the genome can affect the survival of patients diagnosed with breast cancer.
Our findings reveal important factors related to the vital status of patients using causal discovery algorithms.
Results are validated through language models trained on biomedical literature.
arXiv Detail & Related papers (2023-05-28T17:07:46Z) - Improving generalization of machine learning-identified biomarkers with
causal modeling: an investigation into immune receptor diagnostics [2.40246230430283]
We focus on a specific, recently established high-dimensional biomarker - adaptive immune receptor repertoires (AIRRs)
We argue that causal modeling improves machine learning-based biomarker robustness by identifying stable relations between variables.
arXiv Detail & Related papers (2022-04-20T08:15:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.