Related papers: AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence

AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence

URL: http://arxiv.org/abs/2504.04430v5
Date: Wed, 30 Jul 2025 11:42:12 GMT
Title: AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence
Authors: Matej Šprogar,
Abstract summary: Existing evaluation frameworks fail to capture generality at its core and offer no guidance.<n>The artificial general intelligence testbed (AGITB) is a novel and freely available benchmarking suite comprising twelve fully automatable tests.<n>AGITB requires models to forecast temporal sequences without pretraining, symbolic manipulation, or semantic grounding.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite major advances in machine learning, current artificial intelligence systems continue to fall short of human-like general intelligence. While large language and reasoning models can generate fluent and coherent outputs, they lack the deep understanding and adaptive reasoning that characterize truly general intelligence. Existing evaluation frameworks, which are centered on broad language or perception tasks, fail to capture generality at its core and offer no guidance. The artificial general intelligence testbed (AGITB) is a novel and freely available benchmarking suite comprising twelve fully automatable tests designed to evaluate low-level cognitive precursors through binary signal prediction. AGITB requires models to forecast temporal sequences without pretraining, symbolic manipulation, or semantic grounding. The framework isolates core computational invariants - such as determinism, sensitivity, and generalization - that align with principles of biological information processing. Engineered to resist brute-force and memorization-based approaches, AGITB presumes no prior knowledge and demands learning from first principles. While humans pass all tests, no current AI system has met the full AGITB criteria, underscoring its potential as a rigorous, interpretable, and actionable benchmark for guiding and evaluating progress toward artificial general intelligence. A reference implementation of AGITB is available on GitHub.

Related papers

Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence [59.07578850674114]
Sound deductive reasoning is an indisputably desirable aspect of general intelligence.<n>It is well-documented that even the most advanced frontier systems regularly and consistently falter on easily-solvable reasoning tasks.<n>We argue that their unsound behavior is a consequence of the statistical learning approach powering their development.
arXiv Detail & Related papers (2025-06-30T14:37:50Z)
Bhatt Conjectures: On Necessary-But-Not-Sufficient Benchmark Tautology for Human Like Reasoning [0.0]
Bhatt Conjectures framework introduces rigorous, hierarchical benchmarks for evaluating AI reasoning and understanding.<n>Agentreasoning-sdk demonstrates practical implementation, revealing that current AI models struggle with complex reasoning tasks.
arXiv Detail & Related papers (2025-06-13T02:41:18Z)
Generalising from Self-Produced Data: Model Training Beyond Human Constraints [0.0]
This paper introduces a novel framework in which AI models autonomously generate and validate new knowledge.<n>Central to this approach is an unbounded, ungamable numeric reward that guides learning without requiring human benchmarks.
arXiv Detail & Related papers (2025-04-07T03:48:02Z)
Beyond Detection: Designing AI-Resilient Assessments with Automated Feedback Tool to Foster Critical Thinking [0.0]
This research proposes a proactive, AI-resilient solution based on assessment design rather than detection.<n>It introduces a web-based Python tool that integrates Bloom's taxonomy with advanced natural language processing techniques.<n>It helps educators determine whether a task targets lower-order thinking such as recall and summarization or higher-order skills such as analysis, evaluation, and creation.
arXiv Detail & Related papers (2025-03-30T23:13:00Z)
General Scales Unlock AI Evaluation with Explanatory and Predictive Power [57.7995945974989]
benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems.<n>We introduce general scales for AI evaluation that can explain what common AI benchmarks really measure.<n>Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate.
arXiv Detail & Related papers (2025-03-09T01:13:56Z)
Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents [55.63497537202751]
Article explores the convergence of connectionist and symbolic artificial intelligence (AI) Traditionally, connectionist AI focuses on neural networks, while symbolic AI emphasizes symbolic representation and logic. Recent advancements in large language models (LLMs) highlight the potential of connectionist architectures in handling human language as a form of symbols.
arXiv Detail & Related papers (2024-07-11T14:00:53Z)
Integration of cognitive tasks into artificial general intelligence test for large models [54.72053150920186]
We advocate for a comprehensive framework of cognitive science-inspired artificial general intelligence (AGI) tests. The cognitive science-inspired AGI tests encompass the full spectrum of intelligence facets, including crystallized intelligence, fluid intelligence, social intelligence, and embodied intelligence.
arXiv Detail & Related papers (2024-02-04T15:50:42Z)
Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components. We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z)
Evaluating General-Purpose AI with Psychometrics [43.85432514910491]
We discuss the need for a comprehensive and accurate evaluation of general-purpose AI systems such as large language models. Current evaluation methodology, mostly based on benchmarks of specific tasks, falls short of adequately assessing these versatile AI systems. To tackle these challenges, we suggest transitioning from task-oriented evaluation to construct-oriented evaluation.
arXiv Detail & Related papers (2023-10-25T05:38:38Z)
Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI. While this generative AI approach has produced impressive results, it heavily leans on human supervision. This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation. We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z)
Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models [83.63242931107638]
We propose four characteristics of generally intelligent agents. We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations. We conclude by outlining promising future research directions in the field of artificial general intelligence.
arXiv Detail & Related papers (2023-07-07T13:58:16Z)
Beyond Interpretable Benchmarks: Contextual Learning through Cognitive and Multimodal Perception [0.0]
This study contends that the Turing Test is misinterpreted as an attempt to anthropomorphize computer systems. It emphasizes tacit learning as a cornerstone of general-purpose intelligence, despite its lack of overt interpretability.
arXiv Detail & Related papers (2022-12-04T08:30:04Z)
Certifiable Artificial Intelligence Through Data Fusion [7.103626867766158]
This paper reviews and proposes concerns in adopting, fielding, and maintaining artificial intelligence (AI) systems. A notional use case is presented with image data fusion to support AI object recognition certifiability considering precision versus distance.
arXiv Detail & Related papers (2021-11-03T03:34:19Z)
The Why, What and How of Artificial General Intelligence Chip Development [0.0]
The intelligent sensing, automation, and edge computing applications have been the market drivers for AI chips. The generalisation, performance, robustness, and scalability of the AI chip solutions are compared with human-like intelligence abilities.
arXiv Detail & Related papers (2020-12-08T02:36:04Z)
Estimating the Brittleness of AI: Safety Integrity Levels and the Need for Testing Out-Of-Distribution Performance [0.0]
Test, Evaluation, Verification, and Validation for Artificial Intelligence (AI) is a challenge that threatens to limit the economic and societal rewards that AI researchers have devoted themselves to producing. This paper argues that neither of those criteria are certain of Deep Neural Networks.
arXiv Detail & Related papers (2020-09-02T03:33:40Z)
Neuro-symbolic Architectures for Context Understanding [59.899606495602406]
We propose the use of hybrid AI methodology as a framework for combining the strengths of data-driven and knowledge-driven approaches. Specifically, we inherit the concept of neuro-symbolism as a way of using knowledge-bases to guide the learning progress of deep neural networks.
arXiv Detail & Related papers (2020-03-09T15:04:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.