Related papers: Psychometric Tests for AI Agents and Their Moduli Space

Related papers

Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems [9.388162021920206]
This survey presents the first comprehensive review of agentic AI in remote sensing.<n>We introduce a unified taxonomy distinguishing between single-agent copilots and multi-agent systems.<n>We review emerging benchmarks that move the evaluation from pixel-level accuracy to trajectory-aware reasoning correctness.
arXiv Detail & Related papers (2026-01-05T08:34:17Z)
Mathematics and Coding are Universal AI Benchmarks [0.0]
We study the special role of mathematics and coding inside the moduli space of psychometric batteries for AI agents.<n>We show that when paired with formal proof kernels (e.g. Lean, Coq), GVU flows on this fiber admit spectrally stable self-improvement regimes.
arXiv Detail & Related papers (2025-12-15T14:36:29Z)
The Geometry of Benchmarks: A New Path Toward AGI [0.0]
We introduce a geometric framework in which all psychometric batteries for AI agents are treated as points in a structured moduli space.<n>First, we define an Autonomous AI (AAI) Scale, a Kardashev-style hierarchy of autonomy grounded in measurable performance.<n>Second, we construct a moduli space of batteries, identifying equivalence classes of benchmarks that are indistinguishable at the level of agent orderings and capability inferences.<n>Third, we introduce a general Generator-Verifier-Updater (GVU) operator that subsumes reinforcement learning, self-play, debate and verifier-based fine-tuning
arXiv Detail & Related papers (2025-12-03T21:34:09Z)
An Operational Kardashev-Style Scale for Autonomous AI - Towards AGI and Superintelligence [0.0]
We propose a Kardashev-inspired yet operational Autonomous AI (AAI) Scale.<n>It measures the progression from fixed robotic process automation (AAI-0) to full artificial general intelligence (AAI-4) and beyond.<n>We define ten capability axes (Autonomy, Generality, Planning, Memory/Persistence, Tool Economy, Self-Revision, Sociality/Coordination, Embodiment, World-Model Fidelity, Economic Throughput) aggregated by a composite AAI-Index.
arXiv Detail & Related papers (2025-11-17T14:24:27Z)
AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment [69.06977852423564]
Image quality assessment (IQA) reflects both the quantification and interpretation of perceptual quality rooted in the human visual system.<n>AgenticIQA decomposes IQA into four subtasks -- distortion detection, distortion analysis, tool selection, and tool execution.<n>To support training and evaluation, we introduce AgenticIQA-200K, a large-scale instruction dataset tailored for IQA agents, and AgenticIQA-Eval, the first benchmark for assessing the planning, execution, and summarization capabilities of VLM-based IQA agents.
arXiv Detail & Related papers (2025-09-30T09:37:01Z)
STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z)
The next question after Turing's question: Introducing the Grow-AI test [51.56484100374058]
This study aims to extend the framework for assessing artificial intelligence, called GROW-AI.<n>GROW-AI is designed to answer the question "Can machines grow up?" -- a natural successor to the Turing Test.<n>The originality of the work lies in the conceptual transposition of the process of "growing" from the human world to that of artificial intelligence.
arXiv Detail & Related papers (2025-08-22T10:19:42Z)
Fusion Intelligence for Digital Twinning AI Data Centers: A Synergistic GenAI-PhyAI Approach [17.699432259756456]
Fusion Intelligence is a novel framework synergizing GenAI's automation with PhyAI's domain grounding.<n>Case studies demonstrate the advantages of our framework in automating the creation and validation of AIDC digital twins.
arXiv Detail & Related papers (2025-05-26T01:58:34Z)
Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations [2.547250631115307]
Aitomia is a platform powered by AI to assist in performing AI-driven atomistic and quantum chemical (QC) simulations.<n>It is equipped with chatbots and AI agents to help experts and guide non-experts in setting up and running atomistic simulations.<n>Aitomia is expected to lower the barrier to performing atomistic simulations, thereby democratizing simulations and accelerating research and development in relevant fields.
arXiv Detail & Related papers (2025-05-13T03:11:41Z)
AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence [0.0]
The Artificial General Intelligence Testbed (AGITB) introduces a novel benchmarking suite comprising fourteen elementary tests.<n>AGITB evaluates models on their ability to forecast the next input in a temporal sequence, step by step, without pretraining.<n>The human cortex satisfies all tests, whereas no current AI system meets the full AGITB criteria.
arXiv Detail & Related papers (2025-04-06T10:01:15Z)
General Scales Unlock AI Evaluation with Explanatory and Predictive Power [57.7995945974989]
benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems.<n>We introduce general scales for AI evaluation that can explain what common AI benchmarks really measure.<n>Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate.
arXiv Detail & Related papers (2025-03-09T01:13:56Z)
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL) This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z)
Levels of AGI for Operationalizing Progress on the Path to AGI [53.28828093836034]
We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors.<n>This framework introduces levels of AGI performance, generality, and autonomy, providing a common language to compare models, assess risks, and measure progress along the path to AGI.
arXiv Detail & Related papers (2023-11-04T17:44:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.