Randomized Controlled Trials for Conditional Access Optimization Agent
- URL: http://arxiv.org/abs/2511.13865v1
- Date: Mon, 17 Nov 2025 19:33:03 GMT
- Title: Randomized Controlled Trials for Conditional Access Optimization Agent
- Authors: James Bono, Beibei Cheng, Joaquin Lozano,
- Abstract summary: We report results from the first randomized controlled trial (RCT) evaluating an AI agent for Conditional Access (CA) policy management in Microsoft Entra.<n>Agent access produced substantial gains: accuracy improved by 48% and task completion time decreased by 43% while holding accuracy constant.<n>These findings demonstrate that purpose-built AI agents can significantly enhance both speed and accuracy in identity administration.
- Score: 0.9558392439655014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI agents are increasingly deployed to automate complex enterprise workflows, yet evidence of their effectiveness in identity governance is limited. We report results from the first randomized controlled trial (RCT) evaluating an AI agent for Conditional Access (CA) policy management in Microsoft Entra. The agent assists with four high-value tasks: policy merging, Zero-Trust baseline gap detection, phased rollout planning, and user-policy alignment. In a production-grade environment, 162 identity administrators were randomly assigned to a control group (no agent) or treatment group (agent-assisted) and asked to perform these tasks. Agent access produced substantial gains: accuracy improved by 48% and task completion time decreased by 43% while holding accuracy constant. The largest benefits emerged on cognitively demanding tasks such as baseline gap detection. These findings demonstrate that purpose-built AI agents can significantly enhance both speed and accuracy in identity administration.
Related papers
- AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios [49.90735676070039]
The capacity of AI agents to effectively handle tasks of increasing duration and complexity continues to grow.<n>We argue that current evaluations prioritize increasing task difficulty without sufficiently addressing the diversity of agentic tasks.<n>We propose AgentIF-OneDay, aimed at determining whether general users can utilize natural language instructions and AI agents to complete a diverse array of daily tasks.
arXiv Detail & Related papers (2026-01-28T13:49:18Z) - Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification [71.98473277917962]
Recent advances in Deep Research Agents (DRAs) are transforming automated knowledge discovery and problem-solving.<n>We propose an alternative paradigm: self-evolving the agent's ability by iteratively verifying the policy model's outputs, guided by meticulously crafted rubrics.<n>We present DeepVerifier, a rubrics-based outcome reward verifier that leverages the asymmetry of verification.
arXiv Detail & Related papers (2026-01-22T09:47:31Z) - SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents [45.71333459905404]
SmartSnap is a paradigm shift from passive, post-hoc verification to proactive, in-situ self-verification by the agent itself.<n>We introduce the Self-Verifying Agent, a new type of agent designed with dual missions: to complete a task and to prove its accomplishment with curated evidences.<n>Experiments on mobile tasks across model families and scales demonstrate that our SmartSnap paradigm allows training LLM-driven agents in a scalable manner.
arXiv Detail & Related papers (2025-12-26T14:51:39Z) - Randomized Controlled Trials for Phishing Triage Agent [1.2691047660244335]
This paper presents the first randomized controlled trial (RCT) evaluating the impact of a domain-specific AI agent on analyst productivity and accuracy.<n>Agent-augmented analysts achieved up to 6.5 times as many true positives per analyst minute and a 77% improvement in verdict accuracy compared to a control group.
arXiv Detail & Related papers (2025-11-17T19:23:08Z) - AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress [71.02263260394261]
Large language models (LLMs) still encounter challenges in multi-turn decision-making tasks.<n>We build process reward models (PRMs) to evaluate each decision and guide the agent's decision-making process.<n>AgentPRM captures both the interdependence between sequential decisions and their contribution to the final goal.
arXiv Detail & Related papers (2025-11-11T14:57:54Z) - Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems [11.42175340352007]
We introduce SupervisorAgent, a lightweight and modular framework for runtime, adaptive supervision.<n>SupervisorAgent intervenes at critical junctures to proactively correct errors, guide inefficient behaviors, and purify observations.<n>On the challenging GAIA benchmark, SupervisorAgent reduces the token consumption of the Smolagent framework by an average of 29.45% without compromising its success rate.
arXiv Detail & Related papers (2025-10-30T15:12:59Z) - Alita-G: Self-Evolving Generative Agent for Agent Generation [54.49365835457433]
We present ALITA-G, a framework that transforms a general-purpose agent into a domain expert.<n>In this framework, a generalist agent executes a curated suite of target-domain tasks.<n>It attains strong gains while reducing computation costs.
arXiv Detail & Related papers (2025-10-27T17:59:14Z) - What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment [3.5583478152586756]
Agent GPA is an evaluation paradigm based on an agent's operational loop of setting goals, devising plans, and executing actions.<n>The framework includes five evaluation metrics: Goal Fulfillment, Logical Consistency, Execution Efficiency, Plan Quality, and Plan Adherence.
arXiv Detail & Related papers (2025-10-09T22:40:19Z) - AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning [78.5751183537704]
AdvEvo-MARL is a co-evolutionary multi-agent reinforcement learning framework that internalizes safety into task agents.<n>Rather than relying on external guards, AdvEvo-MARL jointly optimize attackers and defenders.
arXiv Detail & Related papers (2025-10-02T02:06:30Z) - Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z) - VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection [55.957275374847484]
VulAgent is a multi-agent vulnerability detection framework based on hypothesis validation.<n>It implements a semantics-sensitive, multi-view detection pipeline, each aligned to a specific analysis perspective.<n>On average, VulAgent improves overall accuracy by 6.6%, increases the correct identification rate of vulnerable--fixed code pairs by up to 450%, and reduces the false positive rate by about 36%.
arXiv Detail & Related papers (2025-09-15T02:25:38Z) - Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z) - Enhancing Clinical Decision-Making: Integrating Multi-Agent Systems with Ethical AI Governance [1.0195618602298682]
We compare novel agent system designs that use modular agents to analyze laboratory results, vital signs, and clinical context.<n>We implement our agent system with the eICU database, including running lab analysis, vitals-only interpreters, and contextual reasoners agents.
arXiv Detail & Related papers (2025-03-25T05:32:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.