Evaluating Intelligence via Trial and Error
- URL: http://arxiv.org/abs/2502.18858v2
- Date: Mon, 03 Mar 2025 13:38:50 GMT
- Title: Evaluating Intelligence via Trial and Error
- Authors: Jingtao Zhan, Jiahao Zhao, Jiayu Li, Yiqun Liu, Bo Zhang, Qingyao Ai, Jiaxin Mao, Hongning Wang, Min Zhang, Shaoping Ma,
- Abstract summary: We introduce Survival Game as a framework to evaluate intelligence based on the number of failed attempts in a trial-and-error process.<n>When the expectation and variance of failure counts are both finite, it signals the ability to consistently find solutions to new challenges.<n>Our results show that while AI systems achieve the Autonomous Level in simple tasks, they are still far from it in more complex tasks.
- Score: 59.80426744891971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intelligence is a crucial trait for species to find solutions within a limited number of trial-and-error attempts. Building on this idea, we introduce Survival Game as a framework to evaluate intelligence based on the number of failed attempts in a trial-and-error process. Fewer failures indicate higher intelligence. When the expectation and variance of failure counts are both finite, it signals the ability to consistently find solutions to new challenges, which we define as the Autonomous Level of intelligence. Using Survival Game, we comprehensively evaluate existing AI systems. Our results show that while AI systems achieve the Autonomous Level in simple tasks, they are still far from it in more complex tasks, such as vision, search, recommendation, and language. While scaling current AI technologies might help, this would come at an astronomical cost. Projections suggest that achieving the Autonomous Level for general tasks would require $10^{26}$ parameters. To put this into perspective, loading such a massive model requires so many H100 GPUs that their total value is $10^{7}$ times that of Apple Inc.'s market value. Even with Moore's Law, supporting such a parameter scale would take $70$ years. This staggering cost highlights the complexity of human tasks and the inadequacies of current AI technologies. To further investigate this phenomenon, we conduct a theoretical analysis of Survival Game and its experimental results. Our findings suggest that human tasks possess a criticality property. As a result, Autonomous Level requires a deep understanding of the task's underlying mechanisms. Current AI systems, however, do not fully grasp these mechanisms and instead rely on superficial mimicry, making it difficult for them to reach an autonomous level. We believe Survival Game can not only guide the future development of AI but also offer profound insights into human intelligence.
Related papers
- General Scales Unlock AI Evaluation with Explanatory and Predictive Power [57.7995945974989]
benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems.
We introduce general scales for AI evaluation that can explain what common AI benchmarks really measure.
Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate.
arXiv Detail & Related papers (2025-03-09T01:13:56Z) - Imagining and building wise machines: The centrality of AI metacognition [78.76893632793497]
We argue that shortcomings stem from one overarching failure: AI systems lack wisdom.
While AI research has focused on task-level strategies, metacognition is underdeveloped in AI systems.
We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety.
arXiv Detail & Related papers (2024-11-04T18:10:10Z) - AI-as-exploration: Navigating intelligence space [0.05657375260432172]
I articulate the contours of a rather neglected but central scientific role that AI has to play.
The basic thrust of AI-as-exploration is that of creating and studying systems that can reveal candidate building blocks of intelligence.
arXiv Detail & Related papers (2024-01-15T21:06:20Z) - On a Functional Definition of Intelligence [0.0]
Without an agreed-upon definition of intelligence, asking "is this system intelligent?"" is an untestable question.
Most work on precisely capturing what we mean by "intelligence" has come from the fields of philosophy, psychology, and cognitive science.
We present an argument for a purely functional, black-box definition of intelligence, distinct from how that intelligence is actually achieved.
arXiv Detail & Related papers (2023-12-15T05:46:49Z) - The Generative AI Paradox: "What It Can Create, It May Not Understand" [81.89252713236746]
Recent wave of generative AI has sparked excitement and concern over potentially superhuman levels of artificial intelligence.
At the same time, models still show basic errors in understanding that would not be expected even in non-expert humans.
This presents us with an apparent paradox: how do we reconcile seemingly superhuman capabilities with the persistence of errors that few humans would make?
arXiv Detail & Related papers (2023-10-31T18:07:07Z) - AI for Mathematics: A Cognitive Science Perspective [86.02346372284292]
Mathematics is one of the most powerful conceptual systems developed and used by the human species.
Rapid progress in AI, particularly propelled by advances in large language models (LLMs), has sparked renewed, widespread interest in building such systems.
arXiv Detail & Related papers (2023-10-19T02:00:31Z) - Suffering Toasters -- A New Self-Awareness Test for AI [0.0]
We argue that all current intelligence tests are insufficient to point to the existence or lack of intelligence.
We propose a new approach to test for artificial self-awareness and outline a possible implementation.
arXiv Detail & Related papers (2023-06-29T18:58:01Z) - AI Maintenance: A Robustness Perspective [91.28724422822003]
We introduce highlighted robustness challenges in the AI lifecycle and motivate AI maintenance by making analogies to car maintenance.
We propose an AI model inspection framework to detect and mitigate robustness risks.
Our proposal for AI maintenance facilitates robustness assessment, status tracking, risk scanning, model hardening, and regulation throughout the AI lifecycle.
arXiv Detail & Related papers (2023-01-08T15:02:38Z) - Challenges of Artificial Intelligence -- From Machine Learning and
Computer Vision to Emotional Intelligence [0.0]
We believe that AI is a helper, not a ruler of humans.
Computer vision has been central to the development of AI.
Emotions are central to human intelligence, but little use has been made in AI.
arXiv Detail & Related papers (2022-01-05T06:00:22Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z) - The MineRL BASALT Competition on Learning from Human Feedback [58.17897225617566]
The MineRL BASALT competition aims to spur forward research on this important class of techniques.
We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions.
We provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline.
arXiv Detail & Related papers (2021-07-05T12:18:17Z) - Hybrid Intelligence [4.508830262248694]
We argue that the most likely paradigm for the division of labor between humans and machines in the next decades is Hybrid Intelligence.
This concept aims at using the complementary strengths of human intelligence and AI, so that they can perform better than each of the two could separately.
arXiv Detail & Related papers (2021-05-03T08:56:09Z) - Inductive Biases for Deep Learning of Higher-Level Cognition [108.89281493851358]
A fascinating hypothesis is that human and animal intelligence could be explained by a few principles.
This work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing.
The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans' abilities.
arXiv Detail & Related papers (2020-11-30T18:29:25Z) - Future Trends for Human-AI Collaboration: A Comprehensive Taxonomy of
AI/AGI Using Multiple Intelligences and Learning Styles [95.58955174499371]
We describe various aspects of multiple human intelligences and learning styles, which may impact on a variety of AI problem domains.
Future AI systems will be able not only to communicate with human users and each other, but also to efficiently exchange knowledge and wisdom.
arXiv Detail & Related papers (2020-08-07T21:00:13Z) - Dynamic Cognition Applied to Value Learning in Artificial Intelligence [0.0]
Several researchers in the area are trying to develop a robust, beneficial, and safe concept of artificial intelligence.
It is of utmost importance that artificial intelligent agents have their values aligned with human values.
A possible approach to this problem would be to use theoretical models such as SED.
arXiv Detail & Related papers (2020-05-12T03:58:52Z) - Is Intelligence Artificial? [0.0]
This paper attempts to give a unifying definition that can be applied to the natural world in general and then Artificial Intelligence.
A metric that is grounded in Kolmogorov's Complexity Theory is suggested, which leads to a measurement about entropy.
A version of an accepted AI test is then put forward as the 'acid test' and might be what a free-thinking program would try to achieve.
arXiv Detail & Related papers (2014-03-05T11:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.