Medical Test-free Disease Detection Based on Big Data
- URL: http://arxiv.org/abs/2512.07856v1
- Date: Wed, 26 Nov 2025 08:25:47 GMT
- Title: Medical Test-free Disease Detection Based on Big Data
- Authors: Haokun Zhao, Yingzhe Bai, Qingyang Xu, Lixin Zhou, Jianxin Chen, Jicong Fan,
- Abstract summary: We propose Collaborative Learning for Disease Detection (CLDD), a novel graph-based deep learning model.<n> CLDD integrates patient-disease interactions and demographic features from electronic health records to detect hundreds or thousands of diseases for every patient.<n>Experiments on a processed version of the MIMIC-IV dataset comprising 61,191 patients and 2,000 diseases demonstrate that CLDD consistently outperforms representative baselines.
- Score: 18.234796014316718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate disease detection is of paramount importance for effective medical treatment and patient care. However, the process of disease detection is often associated with extensive medical testing and considerable costs, making it impractical to perform all possible medical tests on a patient to diagnose or predict hundreds or thousands of diseases. In this work, we propose Collaborative Learning for Disease Detection (CLDD), a novel graph-based deep learning model that formulates disease detection as a collaborative learning task by exploiting associations among diseases and similarities among patients adaptively. CLDD integrates patient-disease interactions and demographic features from electronic health records to detect hundreds or thousands of diseases for every patient, with little to no reliance on the corresponding medical tests. Extensive experiments on a processed version of the MIMIC-IV dataset comprising 61,191 patients and 2,000 diseases demonstrate that CLDD consistently outperforms representative baselines across multiple metrics, achieving a 6.33\% improvement in recall and 7.63\% improvement in precision. Furthermore, case studies on individual patients illustrate that CLDD can successfully recover masked diseases within its top-ranked predictions, demonstrating both interpretability and reliability in disease prediction. By reducing diagnostic costs and improving accessibility, CLDD holds promise for large-scale disease screening and social health security.
Related papers
- Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes [55.310195121276074]
We propose a Knowledge graph-enhanced, Prototype-aware, and Interpretable (KPI) framework to predict diseases.<n>It integrates structured and trusted medical knowledge into a unified disease knowledge graph, constructs clinically meaningful disease prototypes, and employs contrastive learning to enhance predictive accuracy.<n>It provides clinically valid explanations that closely align with patient narratives, highlighting its practical value for patient-centered healthcare delivery.
arXiv Detail & Related papers (2025-12-09T05:37:54Z) - Evolving Diagnostic Agents in a Virtual Clinical Environment [75.59389103511559]
We present a framework for training large language models (LLMs) as diagnostic agents with reinforcement learning.<n>Our method acquires diagnostic strategies through interactive exploration and outcome-based feedback.<n>DiagAgent significantly outperforms 10 state-of-the-art LLMs, including DeepSeek-v3 and GPT-4o.
arXiv Detail & Related papers (2025-10-28T17:19:47Z) - An Explainable Hybrid AI Framework for Enhanced Tuberculosis and Symptom Detection [55.35661671061754]
Tuberculosis remains a critical global health issue, particularly in resource-limited and remote areas.<n>We propose a framework which enhances disease and symptom detection on chest X-rays by integrating two supervised heads and a self-supervised head.<n>Our model achieves an accuracy of 98.85% for distinguishing between COVID-19, tuberculosis, and normal cases, and a macro-F1 score of 90.09% for multilabel symptom detection.
arXiv Detail & Related papers (2025-10-21T17:18:55Z) - An Agentic System for Rare Disease Diagnosis with Traceable Reasoning [69.46279475491164]
We introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM)<n>DeepRare generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning.<n>The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases.
arXiv Detail & Related papers (2025-06-25T13:42:26Z) - Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases.
We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models.
Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Large Language Multimodal Models for 5-Year Chronic Disease Cohort Prediction Using EHR Data [15.474201222908107]
Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide.
We proposed a novel Large Language Multimodal Models (LLMMs) framework incorporating multimodal data for the prediction of chronic disease risk.
Our method combined a text embedding encoder and multi-head attention layer to learn laboratory test values, utilizing a deep neural network (DNN) module to merge blood features with chronic disease semantics into a latent space.
arXiv Detail & Related papers (2024-03-02T22:33:17Z) - Dental Severity Assessment through Few-shot Learning and SBERT Fine-tuning [0.356008609689971]
The integration of automated systems in oral healthcare has become increasingly crucial.
Machine learning approaches offer a viable solution to address challenges such as diagnostic difficulties, inefficiencies, and errors in oral disease diagnosis.
In this study, thirteen different machine learning, deep learning, and large language models were employed to determine the severity level of oral health issues.
arXiv Detail & Related papers (2024-02-24T08:02:19Z) - Feature-context driven Federated Meta-Learning for Rare Disease
Prediction [5.841823822793997]
We propose a novel approach for rare disease prediction based on federated meta-learning.
We show that our approach out-performs the original federated meta-learning algorithm in accuracy and speed.
arXiv Detail & Related papers (2021-12-29T02:18:43Z) - Application of Machine Learning to Predict the Risk of Alzheimer's
Disease: An Accurate and Practical Solution for Early Diagnostics [1.1470070927586016]
Alzheimer's Disease (AD) ravages the cognitive ability of more than 5 million Americans and creates an enormous strain on the health care system.
This paper proposes a machine learning predictive model for AD development without medical imaging and with fewer clinical visits and tests.
Our model is trained and validated using demographic, biomarker and cognitive test data from two prominent research studies.
arXiv Detail & Related papers (2020-06-02T14:52:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.