Related papers: STRIDE: A Systematic Framework for Selecting AI Modalities

STRIDE: A Systematic Framework for Selecting AI Modalities - Agentic AI, AI Assistants, or LLM Calls

URL: http://arxiv.org/abs/2512.02228v1
Date: Mon, 01 Dec 2025 21:54:07 GMT
Title: STRIDE: A Systematic Framework for Selecting AI Modalities - Agentic AI, AI Assistants, or LLM Calls
Authors: Shubhi Asthana, Bing Zhang, Chad DeLuca, Ruchi Mahindru, Hima Patel,
Abstract summary: We present STRIDE (Systematic Task Reasoning Intelligence Deployment Evaluator), a framework that provides principled recommendations for selecting between three modalities.<n> STRIDE integrates structured task decomposition, dynamism attribution, and self-reflection requirement analysis to produce an Agentic Suitability Score.<n>It achieved 92% accuracy in modality selection, reduced unnecessary agent deployments by 45%, and cut resource costs by 37%.
Score: 6.5640770609606385
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid shift from stateless large language models (LLMs) to autonomous, goal-driven agents raises a central question: When is agentic AI truly necessary? While agents enable multi-step reasoning, persistent memory, and tool orchestration, deploying them indiscriminately leads to higher cost, complexity, and risk. We present STRIDE (Systematic Task Reasoning Intelligence Deployment Evaluator), a framework that provides principled recommendations for selecting between three modalities: (i) direct LLM calls, (ii) guided AI assistants, and (iii) fully autonomous agentic AI. STRIDE integrates structured task decomposition, dynamism attribution, and self-reflection requirement analysis to produce an Agentic Suitability Score, ensuring that full agentic autonomy is reserved for tasks with inherent dynamism or evolving context. Evaluated across 30 real-world tasks spanning SRE, compliance, and enterprise automation, STRIDE achieved 92% accuracy in modality selection, reduced unnecessary agent deployments by 45%, and cut resource costs by 37%. Expert validation over six months in SRE and compliance domains confirmed its practical utility, with domain specialists agreeing that STRIDE effectively distinguishes between tasks requiring simple LLM calls, guided assistants, or full agentic autonomy. This work reframes agent adoption as a necessity-driven design decision, ensuring autonomy is applied only when its benefits justify the costs.

Related papers

Secure and Energy-Efficient Wireless Agentic AI Networks [12.588984049305866]
secure wireless agentic AI network comprises one supervisor AI agent and multiple other AI agents.<n>Agents dynamically assign other AI agents to participate in cooperative reasoning.<n>Unselected AI agents act as friendly jammers to degrade the eavesdropper's interception performance.
arXiv Detail & Related papers (2026-02-16T21:42:33Z)
AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios [49.90735676070039]
The capacity of AI agents to effectively handle tasks of increasing duration and complexity continues to grow.<n>We argue that current evaluations prioritize increasing task difficulty without sufficiently addressing the diversity of agentic tasks.<n>We propose AgentIF-OneDay, aimed at determining whether general users can utilize natural language instructions and AI agents to complete a diverse array of daily tasks.
arXiv Detail & Related papers (2026-01-28T13:49:18Z)
Towards 6G Native-AI Edge Networks: A Semantic-Aware and Agentic Intelligence Paradigm [85.7583231789615]
6G positions intelligence as a native network capability, transforming the design of radio access networks (RANs)<n>Within this vision, Semantic-native communication and agentic intelligence are expected to play central roles.<n>Agentic intelligence endows distributed RAN entities with goal-driven autonomy, reasoning, planning, and multi-agent collaboration.
arXiv Detail & Related papers (2025-12-04T03:09:33Z)
AgentEvolver: Towards Efficient Self-Evolving Agent System [51.54882384204726]
We present AgentEvolver, a self-evolving agent system that drives autonomous agent learning.<n>AgentEvolver introduces three synergistic mechanisms: self-questioning, self-navigating, and self-attributing.<n>Preliminary experiments indicate that AgentEvolver achieves more efficient exploration, better sample utilization, and faster adaptation compared to traditional RL-based baselines.
arXiv Detail & Related papers (2025-11-13T15:14:47Z)
Towards Outcome-Oriented, Task-Agnostic Evaluation of AI Agents [1.0305173936249623]
This white paper proposes a novel framework of eleven outcome-based, task-agnostic performance metrics for AI agents.<n>We introduce metrics such as Goal Completion Rate (GCR), Autonomy Index (AIx), Multi-Step Task Resilience (MTR), and Business Impact Efficiency (BIE)<n>Our results reveal significant performance trade-offs between different agent designs, highlighting the Hybrid Agent as the most consistently high-performing model.
arXiv Detail & Related papers (2025-11-11T13:40:46Z)
AURA: An Agent Autonomy Risk Assessment Framework [0.0]
AURA (Agent aUtonomy Risk Assessment) is a unified framework designed to detect, quantify, and mitigate risks arising from agentic AI.<n>AURA provides an interactive process to score, evaluate and mitigate the risks of running one or multiple AI Agents, synchronously or asynchronously.<n>AURA supports a responsible and transparent adoption of agentic AI and provides robust risk detection and mitigation while balancing computational resources.
arXiv Detail & Related papers (2025-10-17T15:30:29Z)
LIMI: Less is More for Agency [49.63355240818081]
LIMI (Less Is More for Intelligent Agency) demonstrates that agency follows radically different development principles.<n>We show that sophisticated agentic intelligence can emerge from minimal but strategically curated demonstrations of autonomous behavior.<n>Our findings establish the Agency Efficiency Principle: machine autonomy emerges not from data abundance but from strategic curation of high-quality agentic demonstrations.
arXiv Detail & Related papers (2025-09-22T10:59:32Z)
Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing Tasks [8.218266805768687]
We present a benchmark of 34 representative programmable tasks designed to rigorously assess autonomous agents.<n>We evaluate three popular open-source agent frameworks combined with two LLM backbones, observing a task completion rate of approximately 50%.<n>We develop a three-tier taxonomy of failure causes aligned with task phases, highlighting planning errors, task execution issues, and incorrect response generation.
arXiv Detail & Related papers (2025-08-18T17:55:22Z)
Taming Uncertainty via Automation: Observing, Analyzing, and Optimizing Agentic AI Systems [1.9751175705897066]
Large Language Models (LLMs) are increasingly deployed within agentic systems-collections of interacting, LLM-powered agents that execute complex, adaptive using memory, tools, and dynamic planning.<n>Traditional software observability and operations practices fall short in addressing these challenges.<n>This paper introduces AgentOps: a comprehensive framework for observing, analyzing, optimizing, and automating operation of agentic AI systems.
arXiv Detail & Related papers (2025-07-15T12:54:43Z)
Measuring AI agent autonomy: Towards a scalable approach with code inspection [8.344207672507334]
We introduce a code-based assessment of autonomy that eliminates the need to run an AI agent to perform specific tasks.<n>We demonstrate this approach with the AutoGen framework and select applications.
arXiv Detail & Related papers (2025-02-21T04:58:40Z)
Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems. This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process. We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z)
The Foundations of Computational Management: A Systematic Approach to Task Automation for the Integration of Artificial Intelligence into Existing Workflows [55.2480439325792]
This article introduces Computational Management, a systematic approach to task automation. The article offers three easy step-by-step procedures to begin the process of implementing AI within a workflow.
arXiv Detail & Related papers (2024-02-07T01:45:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.