Related papers: Validity Is What You Need

Validity Is What You Need

URL: http://arxiv.org/abs/2510.27628v1
Date: Fri, 31 Oct 2025 17:00:04 GMT
Title: Validity Is What You Need
Authors: Sebastian Benthall, Andrew Clark,
Abstract summary: We consider other definitions of Agentic AI and propose a new realist definition.<n>We note, however, that Agentic AI systems are primarily applications, not foundations.
Score: 3.0111718611142684
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While AI agents have long been discussed and studied in computer science, today's Agentic AI systems are something new. We consider other definitions of Agentic AI and propose a new realist definition. Agentic AI is a software delivery mechanism, comparable to software as a service (SaaS), which puts an application to work autonomously in a complex enterprise setting. Recent advances in large language models (LLMs) as foundation models have driven excitement in Agentic AI. We note, however, that Agentic AI systems are primarily applications, not foundations, and so their success depends on validation by end users and principal stakeholders. The tools and techniques needed by the principal users to validate their applications are quite different from the tools and techniques used to evaluate foundation models. Ironically, with good validation measures in place, in many cases the foundation models can be replaced with much simpler, faster, and more interpretable models that handle core logic. When it comes to Agentic AI, validity is what you need. LLMs are one option that might achieve it.

Related papers

Mining Type Constructs Using Patterns in AI-Generated Code [1.2107297090229683]
It remains unstudied whether AI essentially outperforms humans in type-related programming tasks.<n>We present the first empirical analysis to answer these questions in the domain of TypeScript projects.<n>Surprisingly, even with all these issues, Agentic pull requests have 1.8x higher acceptance rates compared to humans for TypeScript.
arXiv Detail & Related papers (2026-02-20T03:17:42Z)
Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents [14.448267395835721]
We propose a unified taxonomy that breaks agents into Perception, Brain, Planning, Action, Tool Use, and Collaboration.<n>We also group the environments in which these agents operate, including digital operating systems, embodied robotics, and other specialized domains.
arXiv Detail & Related papers (2026-01-18T19:51:16Z)
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning [84.70211451226835]
Large Language Model (LLM) Agents are constrained by a dependency on human-curated data.<n>We introduce Agent0, a fully autonomous framework that evolves high-performing agents without external data.<n>Agent0 substantially boosts reasoning capabilities, improving the Qwen3-8B-Base model by 18% on mathematical reasoning and 24% on general reasoning benchmarks.
arXiv Detail & Related papers (2025-11-20T05:01:57Z)
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents [93.26456498576181]
This paper focuses on the development of native Autonomous Single-Agent models for Deep Research.<n>Our best variant SFR-DR-20B achieves up to 28.7% on Humanity's Last Exam benchmark.
arXiv Detail & Related papers (2025-09-08T02:07:09Z)
Small Language Models are the Future of Agentic AI [42.62162575221445]
We lay out the position that small language models (SLMs) are sufficiently powerful, inherently more suitable, and necessarily more economical for many invocations in agentic systems.<n>We discuss the potential barriers for the adoption of SLMs in agentic systems and outline a general LLM-to-SLM agent conversion algorithm.
arXiv Detail & Related papers (2025-06-02T18:35:16Z)
Agentic AI and Multiagentic: Are We Reinventing the Wheel? [0.0]
The term AI Agentic is often used as a buzzword for what are essentially AI agents, and AI Multiagentic for what are multi-agent systems.<n>This confusion overlooks decades of research in the field of autonomous agents and multi-agent systems.<n>The article advocates for scientific and technological rigour and the use of established terminology from the state of the art in AI.
arXiv Detail & Related papers (2025-06-02T09:19:11Z)
Fundamental Risks in the Current Deployment of General-Purpose AI Models: What Have We (Not) Learnt From Cybersecurity? [60.629883024152576]
Large Language Models (LLMs) have seen rapid deployment in a wide range of use cases.<n>OpenAIs Altera are just a few examples of increased autonomy, data access, and execution capabilities.<n>These methods come with a range of cybersecurity challenges.
arXiv Detail & Related papers (2024-12-19T14:44:41Z)
Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems. This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process. We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z)
CACA Agent: Capability Collaboration based AI Agent [18.84686313298908]
We propose CACA Agent (Capability Collaboration based AI Agent) using an open architecture inspired by service computing. CACA Agent integrates a set of collaborative capabilities to implement AI Agents, not only reducing the dependence on a single LLM. We present a demo to illustrate the operation and the application scenario extension of CACA Agent.
arXiv Detail & Related papers (2024-03-22T11:42:47Z)
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL) This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z)
The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI) We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.