Enabling Responsible, Secure and Sustainable Healthcare AI - A Strategic Framework for Clinical and Operational Impact
- URL: http://arxiv.org/abs/2510.15943v1
- Date: Thu, 09 Oct 2025 12:40:59 GMT
- Title: Enabling Responsible, Secure and Sustainable Healthcare AI - A Strategic Framework for Clinical and Operational Impact
- Authors: Jimmy Joseph,
- Abstract summary: We offer a pragmatic model to operationalize responsible, secure, and sustainable healthcare AI.<n>This framework includes five key pillars - Leadership & Strategy, MLOps & Technical Infrastructure, Governance & Ethics, Education & Workforce Development, and Change Management & Adoption.<n>We demonstrate its utility through two deployments.
- Score: 0.5076419064097734
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We offer a pragmatic model to operationalize responsible, secure, and sustainable healthcare AI, aligning world-class technical excellence with organizational readiness. The framework includes five key pillars - Leadership & Strategy, MLOps & Technical Infrastructure, Governance & Ethics, Education & Workforce Development, and Change Management & Adoption - and is intended to operationalize 'compliance-by-design' while delivering measurable impact. We demonstrate its utility through two deployments. (A) An inpatient length of stay (LOS) prediction service had R^2=0.41-0.58 with validation cohorts in an observational pilot (n = 3,184 encounters, 4 units, June-August 2025). Adoption was 78 percent by week 6, and target units saw 5-10 percent relative declines in mean LOS for complex cases vs. pre-pilot baselines. (B) An AI-augmented radiology second-reader for lung nodules (PACS-integrated with thresholding and explanation overlays) achieved high sensitivity (95 percent) and provided a +8.0 percentage-point lift in detection of sub-centimeter actionable findings, without slowing workflow (median report TAT 23 min, p = 0.64). Both services executed in monitored, auditable pipelines with well-defined rollback, bias checks, and no evidence of security incidents. These findings indicate that by combining strong MLOps and AI security with governance, education, and human-centric change, we can accelerate adoption of AI while improving security and outcomes. We end with limitations, generalization considerations, and a roadmap for scaling across varied clinical and operational use cases.
Related papers
- Human-Guided Agentic AI for Multimodal Clinical Prediction: Lessons from the AgentDS Healthcare Benchmark [0.5066646435185324]
We investigate how human guidance of agentic AI can improve multimodal clinical prediction.<n>We present our approach to three benchmark challenges: 30-day hospital prediction, emergency department cost forecasting, and discharge readiness assessment.<n>Our approach ranked 5th overall in the healthcare domain, with a 3rd-place finish on the discharge readiness task.
arXiv Detail & Related papers (2026-02-23T04:37:45Z) - Improving the Safety and Trustworthiness of Medical AI via Multi-Agent Evaluation Loops [1.412167203558403]
Large Language Models (LLMs) are increasingly applied in healthcare, yet ensuring their ethical integrity and safety compliance remains a major barrier to clinical deployment.<n>This work introduces a multi-agent refinement framework designed to enhance the safety and reliability of medical LLMs through structured, iterative alignment.
arXiv Detail & Related papers (2026-01-19T18:10:34Z) - Towards a Science of Scaling Agent Systems [79.64446272302287]
We formalize a definition for agent evaluation and characterize scaling laws as the interplay between agent quantity, coordination structure, modelic, and task properties.<n>We derive a predictive model using coordination metrics, that cross-validated R2=0, enabling prediction on unseen task domains.<n>We identify three effects: (1) a tool-coordination trade-off: under fixed computational budgets, tool-heavy tasks suffer disproportionately from multi-agent overhead, and (2) a capability saturation: coordination yields diminishing or negative returns once single-agent baselines exceed 45%.
arXiv Detail & Related papers (2025-12-09T06:52:21Z) - DispatchMAS: Fusing taxonomy and artificial intelligence agents for emergency medical services [49.70819009392778]
Large Language Models (LLMs) and Multi-Agent Systems (MAS) offer opportunities to augment dispatchers.<n>This study aimed to develop and evaluate a taxonomy-grounded, multi-agent system for simulating realistic scenarios.
arXiv Detail & Related papers (2025-10-24T08:01:21Z) - Enhancing reliability in AI inference services: An empirical study on real production incidents [6.549475714716768]
We present one of the first provider-internal, practice-based analysis of large language model (LLM) inference incidents.<n>We developed a taxonomy and methodology grounded in a year of operational experience, validating it on 156 high-severity incidents.<n>This study demonstrates how systematic, empirically grounded analysis of inference operations can drive more reliable and cost-efficient LLM serving at scale.
arXiv Detail & Related papers (2025-10-17T23:16:29Z) - Agentic UAVs: LLM-Driven Autonomy with Integrated Tool-Calling and Cognitive Reasoning [3.4643961367503575]
Existing UAV frameworks lack context-aware reasoning, autonomous decision-making, and ecosystem-level integration.<n>This paper introduces the Agentic UAVs framework, a five-layer architecture (Perception, Reasoning, Action, Integration, Learning)<n>A ROS2 and Gazebo-based prototype integrates YOLOv11 object detection with GPT-4 reasoning and local Gemma-3 deployment.
arXiv Detail & Related papers (2025-09-14T08:46:40Z) - Using AI to Optimize Patient Transfer and Resource Utilization During Mass-Casualty Incidents: A Simulation Platform [0.014285185279360277]
Mass incidents (MCIs) overwhelm healthcare systems and demand rapid patient-hospital allocation decisions.<n>We developed and validated a deep reinforcement learning-based decision-support AI agent to optimize patient transfer decisions.<n>MasTER is a web-accessible command dashboard for MCI management simulations.
arXiv Detail & Related papers (2025-09-10T16:46:54Z) - OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks [52.87238755666243]
We present OmniEAR, a framework for evaluating how language models reason about physical interactions, tool usage, and multi-agent coordination in embodied tasks.<n>We model continuous physical properties and complex spatial relationships across 1,500 scenarios spanning household and industrial domains.<n>Our systematic evaluation reveals severe performance degradation when models must reason from constraints.
arXiv Detail & Related papers (2025-08-07T17:54:15Z) - Leveraging AI to Accelerate Medical Data Cleaning: A Comparative Study of AI-Assisted vs. Traditional Methods [3.2666593942117688]
Octozi is an artificial intelligence-assisted platform that combines large language models with domain-specifics to transform medical data review.<n>Economic analysis of a representative Phase III oncology trial reveals potential cost savings of $5.1 million.
arXiv Detail & Related papers (2025-08-07T15:49:32Z) - Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models [87.66870367661342]
Large language models (LLMs) are used in AI applications in healthcare.<n>Red-teaming framework that continuously stress-test LLMs can reveal significant weaknesses in four safety-critical domains.<n>A suite of adversarial agents is applied to autonomously mutate test cases, identify/evolve unsafe-triggering strategies, and evaluate responses.<n>Our framework delivers an evolvable, scalable, and reliable safeguard for the next generation of medical AI.
arXiv Detail & Related papers (2025-07-30T08:44:22Z) - Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z) - Tiered Agentic Oversight: A Hierarchical Multi-Agent System for Healthcare Safety [39.9193491638205]
Tiered Agentic Oversight (TAO) is a hierarchical multi-agent system that enhances AI safety through layered, automated supervision.<n>Inspired by clinical hierarchies (e.g., nurse-physician-specialist) in hospital, TAO routes tasks to specialized agents based on complexity.<n>Experiments reveal TAO outperforms single-agent and other multi-agent systems on 4 out of 5 healthcare safety benchmarks, with up to an 8.2% improvement.
arXiv Detail & Related papers (2025-06-14T12:46:10Z) - AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z) - How Well Can Modern LLMs Act as Agent Cores in Radiology Environments? [54.36730060680139]
RadA-BenchPlat is an evaluation platform that benchmarks the performance of large language models (LLMs) in radiology environments.<n>The platform also defines ten categories of tools for agent-driven task solving and evaluates seven leading LLMs.
arXiv Detail & Related papers (2024-12-12T18:20:16Z) - RAISE -- Radiology AI Safety, an End-to-end lifecycle approach [5.829180249228172]
The integration of AI into radiology introduces opportunities for improved clinical care provision and efficiency.
The focus should be on ensuring models meet the highest standards of safety, effectiveness and efficacy.
The roadmap presented herein aims to expedite the achievement of deployable, reliable, and safe AI in radiology.
arXiv Detail & Related papers (2023-11-24T15:59:14Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.