Related papers: Enabling Doctor-Centric Medical AI with LLMs through Workflow-Aligned Tasks and Benchmarks

Enabling Doctor-Centric Medical AI with LLMs through Workflow-Aligned Tasks and Benchmarks

URL: http://arxiv.org/abs/2510.11040v1
Date: Mon, 13 Oct 2025 06:18:27 GMT
Title: Enabling Doctor-Centric Medical AI with LLMs through Workflow-Aligned Tasks and Benchmarks
Authors: Wenya Xie, Qingying Xiao, Yu Zheng, Xidong Wang, Junying Chen, Ke Ji, Anningzhe Gao, Prayag Tiwari, Xiang Wan, Feng Jiang, Benyou Wang,
Abstract summary: The rise of large language models (LLMs) has transformed healthcare by offering clinical guidance, yet their direct deployment to patients poses safety risks.<n>We propose repositioning LLMs as clinical assistants that collaborate with experienced physicians rather than interacting with patients directly.<n>We construct DoctorFLAN, a large-scale Chinese medical dataset comprising 92,000 Q&A instances across 22 clinical tasks and 27 specialties.
Score: 72.89088985703748
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of large language models (LLMs) has transformed healthcare by offering clinical guidance, yet their direct deployment to patients poses safety risks due to limited domain expertise. To mitigate this, we propose repositioning LLMs as clinical assistants that collaborate with experienced physicians rather than interacting with patients directly. We conduct a two-stage inspiration-feedback survey to identify real-world needs in clinical workflows. Guided by this, we construct DoctorFLAN, a large-scale Chinese medical dataset comprising 92,000 Q&A instances across 22 clinical tasks and 27 specialties. To evaluate model performance in doctor-facing applications, we introduce DoctorFLAN-test (550 single-turn Q&A items) and DotaBench (74 multi-turn conversations). Experimental results with over ten popular LLMs demonstrate that DoctorFLAN notably improves the performance of open-source LLMs in medical contexts, facilitating their alignment with physician workflows and complementing existing patient-oriented models. This work contributes a valuable resource and framework for advancing doctor-centered medical LLM development

Related papers

A Survey of LLM-based Agents in Medicine: How far are we from Baymax? [44.448366897456886]
Large Language Models (LLMs) are transforming healthcare through the development of LLM-based agents.<n>This survey provides a comprehensive review of LLM-based agents in medicine.<n>We analyze the key components of medical agent systems, including system profiles, clinical planning mechanisms, medical reasoning frameworks, and external capacity enhancement.
arXiv Detail & Related papers (2025-02-16T17:21:05Z)
Demystifying Large Language Models for Medicine: A Primer [50.83806796466396]
Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare. This tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice.
arXiv Detail & Related papers (2024-10-24T15:41:56Z)
CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios [50.032101237019205]
CliMedBench is a comprehensive benchmark with 14 expert-guided core clinical scenarios. The reliability of this benchmark has been confirmed in several ways.
arXiv Detail & Related papers (2024-10-04T15:15:36Z)
RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment [54.91736546490813]
We introduce the RuleAlign framework, designed to align Large Language Models with specific diagnostic rules. We develop a medical dialogue dataset comprising rule-based communications between patients and physicians. Experimental results demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-08-22T17:44:40Z)
LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them [41.65016162783525]
We focus on tuning the Large Language Models to be medical assistants who collaborate with more experienced doctors. We construct a Chinese medical dataset called DoctorFLAN to support the entire workflow of doctors. We evaluate LLMs in doctor-oriented scenarios by constructing the DoctorFLAN-textittest containing 550 single-turn Q&A and DotaBench containing 74 multi-turn conversations.
arXiv Detail & Related papers (2024-06-26T03:08:24Z)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z)
Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization [8.456700096020601]
Large language models (LLMs) have shown promise in natural language processing (NLP), but their effectiveness on a diverse range of clinical summarization tasks remains unproven. In this study, we apply adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks. A clinical reader study with ten physicians evaluates summary, completeness, correctness, and conciseness; in a majority of cases, summaries from our best adapted LLMs are either equivalent (45%) or superior (36%) compared to summaries from medical experts.
arXiv Detail & Related papers (2023-09-14T05:15:01Z)
An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models [22.409334091186995]
Large language models (LLMs) often suffer from hallucinations, leading to overly confident but incorrect judgments. This paper introduces an automated evaluation framework that assesses the practical capabilities of LLMs as virtual doctors during multi-turn consultations.
arXiv Detail & Related papers (2023-09-05T09:24:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.