Expert-guided Clinical Text Augmentation via Query-Based Model Collaboration
- URL: http://arxiv.org/abs/2509.21530v1
- Date: Thu, 25 Sep 2025 20:18:39 GMT
- Title: Expert-guided Clinical Text Augmentation via Query-Based Model Collaboration
- Authors: Dongkyu Cho, Miao Zhang, Rumi Chunara,
- Abstract summary: Large language models (LLMs) have demonstrated strong generative capabilities for this purpose.<n>Their applications in high-stakes domains like healthcare present unique challenges due to the risk of generating clinically incorrect or misleading information.<n>We propose a novel query-based model collaboration framework that integrates expert-level domain knowledge to guide the augmentation process.
- Score: 13.279553235224988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data augmentation is a widely used strategy to improve model robustness and generalization by enriching training datasets with synthetic examples. While large language models (LLMs) have demonstrated strong generative capabilities for this purpose, their applications in high-stakes domains like healthcare present unique challenges due to the risk of generating clinically incorrect or misleading information. In this work, we propose a novel query-based model collaboration framework that integrates expert-level domain knowledge to guide the augmentation process to preserve critical medical information. Experiments on clinical prediction tasks demonstrate that our lightweight collaboration-based approach consistently outperforms existing LLM augmentation methods while improving safety through reduced factual errors. This framework addresses the gap between LLM augmentation potential and the safety requirements of specialized domains.
Related papers
- A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z) - Forging a Dynamic Memory: Retrieval-Guided Continual Learning for Generalist Medical Foundation Models [45.285970665585914]
We propose a comprehensive framework for Continual Learning.<n>We employ a multi-modal, multi-layer RAG system that provides real-time guidance for model fine-tuning.<n>We introduce a dynamic knowledge distillation framework.
arXiv Detail & Related papers (2025-12-15T08:09:40Z) - Adaptation of Foundation Models for Medical Image Analysis: Strategies, Challenges, and Future Directions [4.332241609032423]
Foundation models (FMs) have emerged as a transformative paradigm in medical image analysis.<n>This review presents a comprehensive assessment of strategies for adapting FMs to the specific demands of medical imaging.
arXiv Detail & Related papers (2025-11-03T06:57:42Z) - Integrating Genomics into Multimodal EHR Foundation Models [56.31910745104141]
This paper introduces an innovative EHR foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality.<n>The framework aims to learn complex relationships between clinical data and genetic predispositions.<n>This approach is pivotal for unlocking new insights into disease prediction, proactive health management, risk stratification, and personalized treatment strategies.
arXiv Detail & Related papers (2025-10-24T15:56:40Z) - How many patients could we save with LLM priors? [1.8421433205488897]
We present a novel framework for hierarchical Bayesian modeling of adverse events in multi-center clinical trials.<n>Our methodology directly obtains priors from the model using a pre-trained large language model (LLMs)<n>This methodology paves the way for more efficient and expert-informed clinical trial design.
arXiv Detail & Related papers (2025-09-04T14:23:35Z) - NEARL-CLIP: Interacted Query Adaptation with Orthogonal Regularization for Medical Vision-Language Understanding [51.63264715941068]
textbfNEARL-CLIP (iunderlineNteracted quunderlineEry underlineAdaptation with ounderlineRthogonaunderlineL Regularization) is a novel cross-modality interaction VLM-based framework.
arXiv Detail & Related papers (2025-08-06T05:44:01Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Masked Clinical Modelling: A Framework for Synthetic and Augmented Survival Data Generation [1.7769033811751995]
We present Masked Clinical Modelling (MCM), a framework inspired by masked language modelling.
MCM is designed for both data synthesis and conditional data augmentation.
We evaluate this prototype on the WHAS500 dataset using Cox Proportional Hazards models.
arXiv Detail & Related papers (2024-10-22T08:38:46Z) - Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.<n>Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - Stronger Baseline Models -- A Key Requirement for Aligning Machine Learning Research with Clinical Utility [0.0]
Well-known barriers exist when attempting to deploy Machine Learning models in high-stakes, clinical settings.
We show empirically that including stronger baseline models in evaluations has important downstream effects.
We propose some best practices that will enable practitioners to more effectively study and deploy ML models in clinical settings.
arXiv Detail & Related papers (2024-09-18T16:38:37Z) - Beyond Self-Consistency: Ensemble Reasoning Boosts Consistency and Accuracy of LLMs in Cancer Staging [0.33554367023486936]
Cancer staging status is available in clinical reports, but it requires natural language processing to extract it.
With the advance in clinical-oriented large language models, it is promising to extract such status without extensive efforts in training the algorithms.
In this study, we propose an ensemble reasoning approach with the aim of improving the consistency of the model generations.
arXiv Detail & Related papers (2024-04-19T19:34:35Z) - End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding [47.360760580820966]
We present RO-LMM, a comprehensive large multimodal model (LMM) tailored for the field of radiation oncology.<n>This model effectively manages a series of tasks within the clinical workflow, including clinical context summarization, radiation treatment plan suggestion, and plan-guided target volume segmentation.<n>We present a novel Consistency Embedding Fine-Tuning (CEFTune) technique, which boosts LMM's robustness to noisy inputs while preserving the consistency of handling clean inputs.
arXiv Detail & Related papers (2023-11-27T14:49:06Z) - LLM-driven Multimodal Target Volume Contouring in Radiation Oncology [46.23891509553877]
Large language models (LLMs) can facilitate the integration of the textural information and images.
We present a novel LLM-driven multimodal AI, namely LLMSeg, that is applicable to the challenging task of target volume contouring for radiation therapy.
We demonstrate that the proposed model exhibits markedly improved performance compared to conventional unimodal AI models.
arXiv Detail & Related papers (2023-11-03T13:38:42Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.