Related papers: CLIN-LLM: A Safety-Constrained Hybrid Framework for Clinical Diagnosis and Treatment Generation

CLIN-LLM: A Safety-Constrained Hybrid Framework for Clinical Diagnosis and Treatment Generation

URL: http://arxiv.org/abs/2510.22609v1
Date: Sun, 26 Oct 2025 10:11:53 GMT
Title: CLIN-LLM: A Safety-Constrained Hybrid Framework for Clinical Diagnosis and Treatment Generation
Authors: Md. Mehedi Hasan, Rafid Mostafiz, Md. Abir Hossain, Bikash Kumar Paul,
Abstract summary: Large language model (LLM)-based systems often lack medical grounding and fail to quantify uncertainty.<n>We propose CLIN-LLM, a safety-constrained hybrid pipeline that integrates multimodal patient encoding, uncertainty-calibrated disease classification, and retrieval-augmented treatment generation.
Score: 0.31984926651189866
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate symptom-to-disease classification and clinically grounded treatment recommendations remain challenging, particularly in heterogeneous patient settings with high diagnostic risk. Existing large language model (LLM)-based systems often lack medical grounding and fail to quantify uncertainty, resulting in unsafe outputs. We propose CLIN-LLM, a safety-constrained hybrid pipeline that integrates multimodal patient encoding, uncertainty-calibrated disease classification, and retrieval-augmented treatment generation. The framework fine-tunes BioBERT on 1,200 clinical cases from the Symptom2Disease dataset and incorporates Focal Loss with Monte Carlo Dropout to enable confidence-aware predictions from free-text symptoms and structured vitals. Low-certainty cases (18%) are automatically flagged for expert review, ensuring human oversight. For treatment generation, CLIN-LLM employs Biomedical Sentence-BERT to retrieve top-k relevant dialogues from the 260,000-sample MedDialog corpus. The retrieved evidence and patient context are fed into a fine-tuned FLAN-T5 model for personalized treatment generation, followed by post-processing with RxNorm for antibiotic stewardship and drug-drug interaction (DDI) screening. CLIN-LLM achieves 98% accuracy and F1 score, outperforming ClinicalBERT by 7.1% (p < 0.001), with 78% top-5 retrieval precision and a clinician-rated validity of 4.2 out of 5. Unsafe antibiotic suggestions are reduced by 67% compared to GPT-5. These results demonstrate CLIN-LLM's robustness, interpretability, and clinical safety alignment. The proposed system provides a deployable, human-in-the-loop decision support framework for resource-limited healthcare environments. Future work includes integrating imaging and lab data, multilingual extensions, and clinical trial validation.

Related papers

A Multi-Agent Framework for Medical AI: Leveraging Fine-Tuned GPT, LLaMA, and DeepSeek R1 for Evidence-Based and Bias-Aware Clinical Query Processing [0.4349324020366305]
Large language models (LLMs) show promise for healthcare question answering, but clinical use is limited by weak verification, insufficient evidence grounding, and unreliable confidence signalling.<n>We propose a multi-agent medical QA framework that combines complementary LLMs with evidence retrieval, uncertainty estimation, and bias checks to improve answer reliability.
arXiv Detail & Related papers (2026-02-15T14:17:27Z)
A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z)
A Sparse-Attention Deep Learning Model Integrating Heterogeneous Multimodal Features for Parkinson's Disease Severity Profiling [4.813020904720317]
Class-Weighted Sparse-Attention Fusion Network (SAFN) is an interpretable deep learning framework for robust multimodal profiling.<n>SAFN integrates MRI cortical thickness, MRI volumetric measures, clinical assessments, and demographic variables.<n>It achieves an accuracy of 0.98 plus or minus 0.02 and a PR-AUC of 1.00 plus or minus 0.00, outperforming established machine learning and deep learning baselines.
arXiv Detail & Related papers (2026-01-02T00:51:21Z)
One-shot synthesis of rare gastrointestinal lesions improves diagnostic accuracy and clinical training [45.49415063761575]
EndoRare is a one-shot, retraining-free generative framework that synthesizes diverse, high-fidelity lesion exemplars from a single reference image.<n>We validated the framework across four rare pathologies.<n>These results establish a practical, data-efficient pathway to bridge the rare-disease gap in both computer-aided diagnostics and clinical education.
arXiv Detail & Related papers (2025-12-30T15:07:09Z)
A DeepSeek-Powered AI System for Automated Chest Radiograph Interpretation in Clinical Practice [83.11942224668127]
Janus-Pro-CXR (1B) is a chest X-ray interpretation system based on DeepSeek Janus-Pro model.<n>Our system outperforms state-of-the-art X-ray report generation models in automated report generation.
arXiv Detail & Related papers (2025-12-23T13:26:13Z)
Knowledge-Guided Large Language Model for Automatic Pediatric Dental Record Understanding and Safe Antibiotic Recommendation [0.4779196219827507]
This study proposes a Knowledge-Guided Large Language Model (KG-LLM)<n>It integrates a pediatric dental knowledge graph, retrieval-augmented generation (RAG), and a multi-stage safety validation pipeline for evidence-grounded antibiotic recommendation.<n> Experiments on 32,000 de-identified pediatric dental visit records demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2025-12-09T21:11:55Z)
Timely Clinical Diagnosis through Active Test Selection [49.091903570068155]
We propose ACTMED (Adaptive Clinical Test selection via Model-based Experimental Design) to better emulate real-world diagnostic reasoning.<n>LLMs act as flexible simulators, generating plausible patient state distributions and supporting belief updates without requiring structured, task-specific training data.<n>We evaluate ACTMED on real-world datasets and show it can optimize test selection to improve diagnostic accuracy, interpretability, and resource use.
arXiv Detail & Related papers (2025-10-21T18:10:45Z)
A Fully Automatic Framework for Intracranial Pressure Grading: Integrating Keyframe Identification, ONSD Measurement and Clinical Data [3.6652537579778106]
Intracranial pressure (ICP) elevation poses severe threats to cerebral function, thus necessitating monitoring for timely intervention.<n>We introduce a fully automatic two-stage framework for ICP grading, integrating ONSD measurement and clinical data.<n>Our method achieves a validation accuracy of $0.845 pm 0.071$ and an independent test accuracy of 0.786, significantly outperforming conventional threshold-based method.
arXiv Detail & Related papers (2025-09-11T11:37:48Z)
A Narrative-Driven Computational Framework for Clinician Burnout Surveillance [0.5281694565226512]
Clinician burnout poses a substantial threat to patient safety, particularly in high-acuity intensive care units (ICUs)<n>In this study, we analyze 10,000 ICU discharge summaries from MIMIC-IV, a publicly available database derived from the electronic health records of Beth Israel Deaconess Medical Center.<n>The dataset encompasses diverse patient data, including vital signs, medical orders, diagnoses, procedures, treatments, and deidentified free-text clinical notes.
arXiv Detail & Related papers (2025-09-01T19:05:26Z)
Organ-Agents: Virtual Human Physiology Simulator via LLMs [66.40796430669158]
Organ-Agents is a multi-agent framework that simulates human physiology via LLM-driven agents.<n>We curated data from 7,134 sepsis patients and 7,895 controls, generating high-resolution trajectories across 9 systems and 125 variables.<n>Organ-Agents achieved high simulation accuracy on 4,509 held-out patients, with per-system MSEs 0.16 and robustness across SOFA-based severity strata.
arXiv Detail & Related papers (2025-08-20T01:58:45Z)
An Agentic System for Rare Disease Diagnosis with Traceable Reasoning [69.46279475491164]
We introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM)<n>DeepRare generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning.<n>The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases.
arXiv Detail & Related papers (2025-06-25T13:42:26Z)
Simulated patient systems are intelligent when powered by large language model-based AI agents [32.73072809937573]
We developed AIPatient, an intelligent simulated patient system powered by large language model-based AI agents.<n>The system incorporates the Retrieval Augmented Generation framework, powered by six task-specific LLM-based AI agents for complex reasoning.<n>For simulation reality, the system is also powered by the AIPatient KG (Knowledge Graph), built with de-identified real patient data.
arXiv Detail & Related papers (2024-09-27T17:17:15Z)
Towards Reliable Medical Image Segmentation by Modeling Evidential Calibrated Uncertainty [57.023423137202485]
Concerns regarding the reliability of medical image segmentation persist among clinicians.<n>We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.<n>By leveraging subjective logic theory, we explicitly model probability and uncertainty for medical image segmentation.
arXiv Detail & Related papers (2023-01-01T05:02:46Z)
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks. Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets. We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.