MedReadCtrl: Personalizing medical text generation with readability-controlled instruction learning
- URL: http://arxiv.org/abs/2507.07419v1
- Date: Thu, 10 Jul 2025 04:22:36 GMT
- Title: MedReadCtrl: Personalizing medical text generation with readability-controlled instruction learning
- Authors: Hieu Tran, Zonghai Yao, Won Seok Jang, Sharmin Sultana, Allen Chang, Yuan Zhang, Hong Yu,
- Abstract summary: We introduce MedReadCtrl, a readability-controlled instruction tuning framework.<n>MedReadCtrl achieves significantly lower readability instruction-following errors than GPT-4.<n>Experts consistently preferred MedReadCtrl (71.7% vs. 23.3%) especially at low literacy levels.
- Score: 11.459725142182469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative AI has demonstrated strong potential in healthcare, from clinical decision support to patient-facing chatbots that improve outcomes. A critical challenge for deployment is effective human-AI communication, where content must be both personalized and understandable. We introduce MedReadCtrl, a readability-controlled instruction tuning framework that enables LLMs to adjust output complexity without compromising meaning. Evaluations of nine datasets and three tasks across medical and general domains show that MedReadCtrl achieves significantly lower readability instruction-following errors than GPT-4 (e.g., 1.39 vs. 1.59 on ReadMe, p<0.001) and delivers substantial gains on unseen clinical tasks (e.g., +14.7 ROUGE-L, +6.18 SARI on MTSamples). Experts consistently preferred MedReadCtrl (71.7% vs. 23.3%), especially at low literacy levels. These gains reflect MedReadCtrl's ability to restructure clinical content into accessible, readability-aligned language while preserving medical intent, offering a scalable solution to support patient education and expand equitable access to AI-enabled care.
Related papers
- Expert-level validation of AI-generated medical text with scalable language models [12.38276474827152]
There is an immediate need to evaluate the accuracy and safety of LM-generated medical text.<n>Currently, such evaluation relies solely on manual physician review.<n>We propose MedVAL, a self-supervised framework that leverages synthetic data to train evaluator LMs.
arXiv Detail & Related papers (2025-07-03T20:19:18Z) - Medalyze: Lightweight Medical Report Summarization Application Using FLAN-T5-Large [0.0]
Medalyze is an AI-powered application designed to enhance the comprehension of medical texts.<n>It is deployed across a web and mobile platform with real-time inference, leveraging scalable API and YugabyteDB.
arXiv Detail & Related papers (2025-05-17T07:16:58Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini [0.0]
Sporo Health's AI scribe was evaluated against OpenAI's GPT-4o Mini.
Results show that Sporo AI consistently outperformed GPT-4o Mini, achieving higher recall, precision, and overall F1 scores.
arXiv Detail & Related papers (2024-10-20T22:48:40Z) - Towards Evaluating and Building Versatile Large Language Models for Medicine [57.49547766838095]
We present MedS-Bench, a benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts.
MedS-Bench spans 11 high-level clinical tasks, including clinical report summarization, treatment recommendations, diagnosis, named entity recognition, and medical concept explanation.
MedS-Ins comprises 58 medically oriented language corpora, totaling 13.5 million samples across 122 tasks.
arXiv Detail & Related papers (2024-08-22T17:01:34Z) - Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation [19.08691249610632]
This study presents a comprehensive domain- and task-specific adaptation process for the open-source LLaMA-2 13 billion parameter model.<n>Our process incorporates continued pretraining, supervised fine-tuning, and reinforcement learning from both AI and human feedback.<n>Our resulting model, LLaMA-Clinic, can generate clinical notes comparable in quality to those authored by physicians.
arXiv Detail & Related papers (2024-04-25T15:34:53Z) - Augmenting Black-box LLMs with Medical Textbooks for Biomedical Question Answering [48.17095875619711]
We present a system called LLMs Augmented with Medical Textbooks (LLM-AMT)<n>LLM-AMT integrates authoritative medical textbooks into the LLMs' framework using plug-and-play modules.<n>We found that medical textbooks as a retrieval corpus is proven to be a more effective knowledge database than Wikipedia in the medical domain.
arXiv Detail & Related papers (2023-09-05T13:39:38Z) - MedAlign: A Clinician-Generated Dataset for Instruction Following with
Electronic Medical Records [60.35217378132709]
Large language models (LLMs) can follow natural language instructions with human-level fluency.
evaluating LLMs on realistic text generation tasks for healthcare remains challenging.
We introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data.
arXiv Detail & Related papers (2023-08-27T12:24:39Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.