Human-in-the-Loop Interactive Report Generation for Chronic Disease Adherence
- URL: http://arxiv.org/abs/2601.06364v1
- Date: Sat, 10 Jan 2026 00:19:33 GMT
- Title: Human-in-the-Loop Interactive Report Generation for Chronic Disease Adherence
- Authors: Xiaotian Zhang, Jinhong Yu, Pengwei Yan, Le Jiang, Xingyi Shen, Mumo Cheng, Xiaozhong Liu,
- Abstract summary: Chronic disease management requires regular adherence feedback to prevent avoidable hospitalizations.<n>Manual authoring preserves clinical accuracy but does not scale; AI generation scales but can undermine trust in patient-facing contexts.<n>We present a clinician-in-the-loop interface that constrains AI to data organization and preserves physician oversight through recognition-based review.
- Score: 17.904419827298074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chronic disease management requires regular adherence feedback to prevent avoidable hospitalizations, yet clinicians lack time to produce personalized patient communications. Manual authoring preserves clinical accuracy but does not scale; AI generation scales but can undermine trust in patient-facing contexts. We present a clinician-in-the-loop interface that constrains AI to data organization and preserves physician oversight through recognition-based review. A single-page editor pairs AI-generated section drafts with time-aligned visualizations, enabling inline editing with visual evidence for each claim. This division of labor (AI organizes, clinician decides) targets both efficiency and accountability. In a pilot with three physicians reviewing 24 cases, AI successfully generated clinically personalized drafts matching physicians' manual authoring practice (overall mean 4.86/10 vs. 5.0/10 baseline), requiring minimal physician editing (mean 8.3\% content modification) with zero safety-critical issues, demonstrating effective automation of content generation. However, review time remained comparable to manual practice, revealing an accountability paradox: in high-stakes clinical contexts, professional responsibility requires complete verification regardless of AI accuracy. We contribute three interaction patterns for clinical AI collaboration: bounded generation with recognition-based review via chart-text pairing, automated urgency flagging that analyzes vital trends and adherence patterns with fail-safe escalation for missed critical monitoring tasks, and progressive disclosure controls that reduce cognitive load while maintaining oversight. These patterns indicate that clinical AI efficiency requires not only accurate models, but also mechanisms for selective verification that preserve accountability.
Related papers
- Augmenting Clinical Decision-Making with an Interactive and Interpretable AI Copilot: A Real-World User Study with Clinicians in Nephrology and Obstetrics [36.981753143345664]
We present AICare, an interactive and interpretable AI copilot for collaborative clinical decision-making.<n>By analyzing longitudinal electronic health records, AICare grounds dynamic risk predictions in scrutable visualizations.
arXiv Detail & Related papers (2026-01-31T13:41:32Z) - ART: Action-based Reasoning Task Benchmarking for Medical AI Agents [0.0]
We introduce Action-based Reasoning clinical Task benchmark for medical AI agents.<n>We identify three dominant error categories: retrieval failures, aggregation errors, and conditional logic misjudgments.<n>Our four-stage pipeline produces diverse, clinically validated tasks grounded in real patient data.
arXiv Detail & Related papers (2026-01-13T21:26:11Z) - Benchmarking Egocentric Clinical Intent Understanding Capability for Medical Multimodal Large Language Models [48.95516224614331]
We introduce MedGaze-Bench, the first benchmark leveraging clinician gaze as a Cognitive Cursor to assess intent understanding across surgery, emergency simulation, and diagnostic interpretation.<n>Our benchmark addresses three fundamental challenges: visual homogeneity of anatomical structures, strict temporal-causal dependencies in clinical, and implicit adherence to safety protocols.
arXiv Detail & Related papers (2026-01-11T02:20:40Z) - Beyond Static Scoring: Enhancing Assessment Validity via AI-Generated Interactive Verification [0.4260312058817663]
Large Language Models (LLMs) challenge the validity of traditional open-ended assessments by blurring the lines of authorship.<n>This paper introduces a novel Human-AI Collaboration framework that enhances assessment integrity by combining rubric-based automated scoring with AI-generated, targeted follow-up questions.
arXiv Detail & Related papers (2025-12-14T08:13:53Z) - MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging [67.74482877175797]
MIRNet is a novel framework that integrates self-supervised pre-training with constrained graph-based reasoning.<n>We introduce TongueAtlas-4K, a benchmark comprising 4,000 images annotated with 22 diagnostic labels.
arXiv Detail & Related papers (2025-11-13T06:30:41Z) - Motion2Meaning: A Clinician-Centered Framework for Contestable LLM in Parkinson's Disease Gait Interpretation [0.8230528541914085]
Motion2Meaning is a clinician-centered framework that advances Contestable AI.<n>System comprises three key components: a Gait Data Visualization Interface (GDVI), a one-dimensional Convolutional Neural Network (1D-CNN) that predicts Hoehn & Yahr severity stages, and a Contestable Interface (CII)<n>XMED successfully identifies model unreliability by detecting a five-fold increase in explanation discrepancies in incorrect predictions.<n>Our LLM-powered interface enables clinicians to validate correct predictions and successfully contest a portion of the model's errors.
arXiv Detail & Related papers (2025-10-21T12:04:58Z) - How to Evaluate Medical AI [4.23552814358972]
We introduce Relative Precision and Recall of Algorithmic Diagnostics (RPAD and RRAD)<n>RPAD and RRAD compare AI outputs against multiple expert opinions rather than a single reference.<n>Large-scale study shows that top-performing models, such as DeepSeek-V3, achieve consistency on par with or exceeding expert consensus.
arXiv Detail & Related papers (2025-09-15T14:01:22Z) - Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models [87.66870367661342]
Large language models (LLMs) are used in AI applications in healthcare.<n>Red-teaming framework that continuously stress-test LLMs can reveal significant weaknesses in four safety-critical domains.<n>A suite of adversarial agents is applied to autonomously mutate test cases, identify/evolve unsafe-triggering strategies, and evaluate responses.<n>Our framework delivers an evolvable, scalable, and reliable safeguard for the next generation of medical AI.
arXiv Detail & Related papers (2025-07-30T08:44:22Z) - Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z) - Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering [51.26412822853409]
We present a novel personalized federated learning (pFL) method for medical visual question answering (VQA) models.
Our method introduces learnable prompts into a Transformer architecture to efficiently train it on diverse medical datasets without massive computational costs.
arXiv Detail & Related papers (2024-10-23T00:31:17Z) - Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini [0.0]
Sporo Health's AI scribe was evaluated against OpenAI's GPT-4o Mini.
Results show that Sporo AI consistently outperformed GPT-4o Mini, achieving higher recall, precision, and overall F1 scores.
arXiv Detail & Related papers (2024-10-20T22:48:40Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.