Psychometric Tests for AI Agents and Their Moduli Space
- URL: http://arxiv.org/abs/2511.19262v1
- Date: Mon, 24 Nov 2025 16:15:08 GMT
- Title: Psychometric Tests for AI Agents and Their Moduli Space
- Authors: Przemyslaw Chojecki,
- Abstract summary: We make precise the notion of an AAI functional on a battery and set outs that any reasonable autonomy/general intelligence score should satisfy.<n>We show that the composite index ('AAI-Index') defined previously is a special case of our AAI functional.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a moduli-theoretic view of psychometric test batteries for AI agents and connect it explicitly to the AAI score developed previously. First, we make precise the notion of an AAI functional on a battery and set out axioms that any reasonable autonomy/general intelligence score should satisfy. Second, we show that the composite index ('AAI-Index') defined previously is a special case of our AAI functional. Third, we introduce the notion of a cognitive core of an agent relative to a battery and define the associated AAI$_{\textrm{core}}$ score as the restriction of an AAI functional to that core. Finally, we use these notions to describe invariants of batteries under evaluation-preserving symmetries and outline how moduli of equivalent batteries are organized.
Related papers
- Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems [9.388162021920206]
This survey presents the first comprehensive review of agentic AI in remote sensing.<n>We introduce a unified taxonomy distinguishing between single-agent copilots and multi-agent systems.<n>We review emerging benchmarks that move the evaluation from pixel-level accuracy to trajectory-aware reasoning correctness.
arXiv Detail & Related papers (2026-01-05T08:34:17Z) - Mathematics and Coding are Universal AI Benchmarks [0.0]
We study the special role of mathematics and coding inside the moduli space of psychometric batteries for AI agents.<n>We show that when paired with formal proof kernels (e.g. Lean, Coq), GVU flows on this fiber admit spectrally stable self-improvement regimes.
arXiv Detail & Related papers (2025-12-15T14:36:29Z) - The Geometry of Benchmarks: A New Path Toward AGI [0.0]
We introduce a geometric framework in which all psychometric batteries for AI agents are treated as points in a structured moduli space.<n>First, we define an Autonomous AI (AAI) Scale, a Kardashev-style hierarchy of autonomy grounded in measurable performance.<n>Second, we construct a moduli space of batteries, identifying equivalence classes of benchmarks that are indistinguishable at the level of agent orderings and capability inferences.<n>Third, we introduce a general Generator-Verifier-Updater (GVU) operator that subsumes reinforcement learning, self-play, debate and verifier-based fine-tuning
arXiv Detail & Related papers (2025-12-03T21:34:09Z) - An Operational Kardashev-Style Scale for Autonomous AI - Towards AGI and Superintelligence [0.0]
We propose a Kardashev-inspired yet operational Autonomous AI (AAI) Scale.<n>It measures the progression from fixed robotic process automation (AAI-0) to full artificial general intelligence (AAI-4) and beyond.<n>We define ten capability axes (Autonomy, Generality, Planning, Memory/Persistence, Tool Economy, Self-Revision, Sociality/Coordination, Embodiment, World-Model Fidelity, Economic Throughput) aggregated by a composite AAI-Index.
arXiv Detail & Related papers (2025-11-17T14:24:27Z) - AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment [69.06977852423564]
Image quality assessment (IQA) reflects both the quantification and interpretation of perceptual quality rooted in the human visual system.<n>AgenticIQA decomposes IQA into four subtasks -- distortion detection, distortion analysis, tool selection, and tool execution.<n>To support training and evaluation, we introduce AgenticIQA-200K, a large-scale instruction dataset tailored for IQA agents, and AgenticIQA-Eval, the first benchmark for assessing the planning, execution, and summarization capabilities of VLM-based IQA agents.
arXiv Detail & Related papers (2025-09-30T09:37:01Z) - STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z) - The next question after Turing's question: Introducing the Grow-AI test [51.56484100374058]
This study aims to extend the framework for assessing artificial intelligence, called GROW-AI.<n>GROW-AI is designed to answer the question "Can machines grow up?" -- a natural successor to the Turing Test.<n>The originality of the work lies in the conceptual transposition of the process of "growing" from the human world to that of artificial intelligence.
arXiv Detail & Related papers (2025-08-22T10:19:42Z) - Fusion Intelligence for Digital Twinning AI Data Centers: A Synergistic GenAI-PhyAI Approach [17.699432259756456]
Fusion Intelligence is a novel framework synergizing GenAI's automation with PhyAI's domain grounding.<n>Case studies demonstrate the advantages of our framework in automating the creation and validation of AIDC digital twins.
arXiv Detail & Related papers (2025-05-26T01:58:34Z) - Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations [2.547250631115307]
Aitomia is a platform powered by AI to assist in performing AI-driven atomistic and quantum chemical (QC) simulations.<n>It is equipped with chatbots and AI agents to help experts and guide non-experts in setting up and running atomistic simulations.<n>Aitomia is expected to lower the barrier to performing atomistic simulations, thereby democratizing simulations and accelerating research and development in relevant fields.
arXiv Detail & Related papers (2025-05-13T03:11:41Z) - AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence [0.0]
The Artificial General Intelligence Testbed (AGITB) introduces a novel benchmarking suite comprising fourteen elementary tests.<n>AGITB evaluates models on their ability to forecast the next input in a temporal sequence, step by step, without pretraining.<n>The human cortex satisfies all tests, whereas no current AI system meets the full AGITB criteria.
arXiv Detail & Related papers (2025-04-06T10:01:15Z) - General Scales Unlock AI Evaluation with Explanatory and Predictive Power [57.7995945974989]
benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems.<n>We introduce general scales for AI evaluation that can explain what common AI benchmarks really measure.<n>Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate.
arXiv Detail & Related papers (2025-03-09T01:13:56Z) - Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - Levels of AGI for Operationalizing Progress on the Path to AGI [53.28828093836034]
We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors.<n>This framework introduces levels of AGI performance, generality, and autonomy, providing a common language to compare models, assess risks, and measure progress along the path to AGI.
arXiv Detail & Related papers (2023-11-04T17:44:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.