Assessing the Real-World Utility of Explainable AI for Arousal Diagnostics: An Application-Grounded User Study
- URL: http://arxiv.org/abs/2510.21389v1
- Date: Fri, 24 Oct 2025 12:23:02 GMT
- Title: Assessing the Real-World Utility of Explainable AI for Arousal Diagnostics: An Application-Grounded User Study
- Authors: Stefan Kraft, Andreas Theissler, Vera Wienhausen-Wilke, Gjergji Kasneci, Hendrik Lensch,
- Abstract summary: This work presents an application-grounded user study with eight sleep medicine practitioners.<n>We evaluate how the type and timing of assistance influence event-level and clinically most relevant count-based performance, time requirements, and user experience.
- Score: 10.778389510933401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence (AI) systems increasingly match or surpass human experts in biomedical signal interpretation. However, their effective integration into clinical practice requires more than high predictive accuracy. Clinicians must discern \textit{when} and \textit{why} to trust algorithmic recommendations. This work presents an application-grounded user study with eight professional sleep medicine practitioners, who score nocturnal arousal events in polysomnographic data under three conditions: (i) manual scoring, (ii) black-box (BB) AI assistance, and (iii) transparent white-box (WB) AI assistance. Assistance is provided either from the \textit{start} of scoring or as a post-hoc quality-control (\textit{QC}) review. We systematically evaluate how the type and timing of assistance influence event-level and clinically most relevant count-based performance, time requirements, and user experience. When evaluated against the clinical standard used to train the AI, both AI and human-AI teams significantly outperform unaided experts, with collaboration also reducing inter-rater variability. Notably, transparent AI assistance applied as a targeted QC step yields median event-level performance improvements of approximately 30\% over black-box assistance, and QC timing further enhances count-based outcomes. While WB and QC approaches increase the time required for scoring, start-time assistance is faster and preferred by most participants. Participants overwhelmingly favor transparency, with seven out of eight expressing willingness to adopt the system with minor or no modifications. In summary, strategically timed transparent AI assistance effectively balances accuracy and clinical efficiency, providing a promising pathway toward trustworthy AI integration and user acceptance in clinical workflows.
Related papers
- AI-assisted Protocol Information Extraction For Improved Accuracy and Efficiency in Clinical Trial Workflows [0.0]
Structuring protocol content into standard formats has the potential to improve efficiency, support documentation quality, and strengthen compliance.<n>We evaluate an Artificial Intelligence (AI) system using generative LLMs with RetrievalAugmented Generation (RAG) for automated clinical trial protocol information extraction.
arXiv Detail & Related papers (2026-01-19T18:38:36Z) - SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z) - Trainee Action Recognition through Interaction Analysis in CCATT Mixed-Reality Training [1.5641818606249476]
Critical Care Air Transport Team members must stabilize severely injured soldiers by managing ventilators, IV pumps, and suction devices during flight.<n>Recent advances in simulation and multimodal data analytics enable more objective and comprehensive performance evaluation.<n>This study examines how CCATT members are trained using mixed-reality simulations that replicate the high-pressure conditions of aeromedical evacuation.
arXiv Detail & Related papers (2025-09-22T15:19:45Z) - Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z) - Beyond Black-Box AI: Interpretable Hybrid Systems for Dementia Care [2.4339626079536925]
The recent boom of large language models (LLMs) has re-ignited the hope that artificial intelligence (AI) systems could aid medical diagnosis.<n>Despite dazzling benchmark scores, LLM assistants have yet to deliver measurable improvements at the bedside.<n>This scoping review aims to highlight the areas where AI is limited to make practical contributions in the clinical setting.
arXiv Detail & Related papers (2025-07-02T01:43:06Z) - The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z) - Completing A Systematic Review in Hours instead of Months with Interactive AI Agents [21.934330935124866]
We introduce InsightAgent, a human-centered interactive AI agent powered by large language models.<n>InsightAgent partitions a large literature corpus based on semantics and employs a multi-agent design for more focused processing.<n>Our user studies with 9 medical professionals demonstrate that the visualization and interaction mechanisms can effectively improve the quality of synthesized SRs.
arXiv Detail & Related papers (2025-04-21T02:57:23Z) - Over-Relying on Reliance: Towards Realistic Evaluations of AI-Based Clinical Decision Support [12.247046469627554]
We advocate for moving beyond evaluation metrics like Trust, Reliance, Acceptance, and Performance on the AI's task.<n>We call on the community to prioritize ecologically valid, domain-appropriate study setups that measure the emergent forms of value that AI can bring to healthcare professionals.
arXiv Detail & Related papers (2025-04-10T03:28:56Z) - TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [54.98321887435557]
This paper presents a suite of 23 meticulously curated AI-ready datasets covering multi-modal input features and 8 crucial prediction challenges in clinical trial design.<n>We provide basic validation methods for each task to ensure the datasets' usability and reliability.<n>We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z) - RAISE -- Radiology AI Safety, an End-to-end lifecycle approach [5.829180249228172]
The integration of AI into radiology introduces opportunities for improved clinical care provision and efficiency.
The focus should be on ensuring models meet the highest standards of safety, effectiveness and efficacy.
The roadmap presented herein aims to expedite the achievement of deployable, reliable, and safe AI in radiology.
arXiv Detail & Related papers (2023-11-24T15:59:14Z) - Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions.
We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z) - Robust and Efficient Medical Imaging with Self-Supervision [80.62711706785834]
We present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI.
We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data.
arXiv Detail & Related papers (2022-05-19T17:34:18Z) - Contextual Constrained Learning for Dose-Finding Clinical Trials [102.8283665750281]
C3T-Budget is a contextual constrained clinical trial algorithm for dose-finding under both budget and safety constraints.
It recruits patients with consideration of the remaining budget, the remaining time, and the characteristics of each group.
arXiv Detail & Related papers (2020-01-08T11:46:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.