Related papers: CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records

CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records

URL: http://arxiv.org/abs/2506.15118v1
Date: Wed, 18 Jun 2025 03:35:24 GMT
Title: CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records
Authors: Junke Wang, Hongshun Ling, Li Zhang, Longqian Zhang, Fang Wang, Yuan Gao, Zhi Li,
Abstract summary: Existing large language models face two major challenges: insufficient representation of medical knowledge and low efficiency in clinical deployment.<n>This study proposes the CKD-EHR framework, which achieves efficient and accurate disease risk prediction through knowledge distillation techniques.
Score: 16.68137505931177
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Electronic Health Records (EHR)-based disease prediction models have demonstrated significant clinical value in promoting precision medicine and enabling early intervention. However, existing large language models face two major challenges: insufficient representation of medical knowledge and low efficiency in clinical deployment. To address these challenges, this study proposes the CKD-EHR (Clinical Knowledge Distillation for EHR) framework, which achieves efficient and accurate disease risk prediction through knowledge distillation techniques. Specifically, the large language model Qwen2.5-7B is first fine-tuned on medical knowledge-enhanced data to serve as the teacher model.It then generates interpretable soft labels through a multi-granularity attention distillation mechanism. Finally, the distilled knowledge is transferred to a lightweight BERT student model. Experimental results show that on the MIMIC-III dataset, CKD-EHR significantly outperforms the baseline model:diagnostic accuracy is increased by 9%, F1-score is improved by 27%, and a 22.2 times inference speedup is achieved. This innovative solution not only greatly improves resource utilization efficiency but also significantly enhances the accuracy and timeliness of diagnosis, providing a practical technical approach for resource optimization in clinical settings. The code and data for this research are available athttps://github.com/209506702/CKD_EHR.

Related papers

An Explainable AI-Enhanced Machine Learning Approach for Cardiovascular Disease Detection and Risk Assessment [0.0]
Heart disease remains a major global health concern.<n>Traditional diagnostic methods fail to accurately identify and manage heart disease risks.<n>Machine learning has the potential to significantly enhance the accuracy, efficiency, and speed of heart disease diagnosis.
arXiv Detail & Related papers (2025-07-15T10:38:38Z)
Revolutionizing Radiology Workflow with Factual and Efficient CXR Report Generation [0.0]
This paper introduces CXR-PathFinder, a novel Large Language Model (LLM)-centric foundation model specifically engineered for automated chest X-ray (CXR) report generation.<n>We propose a unique training paradigm, Clinician-Guided Adrial Fine-Tuning (CGAFT), which meticulously integrates expert clinical feedback into an adversarial learning framework.<n>Our experiments demonstrate that CXR-PathFinder significantly outperforms existing state-of-the-art medical vision-language models across various quantitative metrics.
arXiv Detail & Related papers (2025-06-01T18:47:49Z)
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.<n>Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z)
A Knowledge-Enhanced Disease Diagnosis Method Based on Prompt Learning and BERT Integration [0.0]
This paper proposes a knowledge-enhanced disease diagnosis method based on a prompt learning framework. The method retrieves structured knowledge from external knowledge graphs related to clinical cases, encodes it, and injects it into the prompt templates to enhance the language model's understanding and reasoning capabilities for the task.
arXiv Detail & Related papers (2024-09-16T15:34:58Z)
MedDet: Generative Adversarial Distillation for Efficient Cervical Disc Herniation Detection [24.833129797776422]
Cervical disc herniation (CDH) is a prevalent musculoskeletal disorder that requires labor-intensive analysis from experts. Despite advancements in automated detection of medical imaging, two significant challenges hinder the real-time application of these methods. We introduce MedDet, which leverages the multi-teacher single-student knowledge distillation for model compression and efficiency. Lastly, we conduct comprehensive experiments on the CDH-1848 dataset, achieving up to a 5% improvement in mAP compared to previous methods.
arXiv Detail & Related papers (2024-08-30T18:38:19Z)
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z)
Automatic diagnosis of knee osteoarthritis severity using Swin transformer [55.01037422579516]
Knee osteoarthritis (KOA) is a widespread condition that can cause chronic pain and stiffness in the knee joint. We propose an automated approach that employs the Swin Transformer to predict the severity of KOA.
arXiv Detail & Related papers (2023-07-10T09:49:30Z)
FineEHR: Refine Clinical Note Representations to Improve Mortality Prediction [3.9026461169566673]
Large-scale electronic health records provide machine learning models with an abundance of clinical text and vital sign data. Despite the emergence of advanced Natural Language Processing (NLP) algorithms for clinical note analysis, the complex textual structure and noise present in raw clinical data have posed significant challenges. We propose FINEEHR, a system that utilizes two representation learning techniques, namely metric learning and fine-tuning, to refine clinical note embeddings.
arXiv Detail & Related papers (2023-04-24T02:42:52Z)
SSD-KD: A Self-supervised Diverse Knowledge Distillation Method for Lightweight Skin Lesion Classification Using Dermoscopic Images [62.60956024215873]
Skin cancer is one of the most common types of malignancy, affecting a large population and causing a heavy economic burden worldwide. Most studies in skin cancer detection keep pursuing high prediction accuracies without considering the limitation of computing resources on portable devices. This study specifically proposes a novel method, termed SSD-KD, that unifies diverse knowledge into a generic KD framework for skin diseases classification.
arXiv Detail & Related papers (2022-03-22T06:54:29Z)
Variational Knowledge Distillation for Disease Classification in Chest X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays. We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.