Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI
- URL: http://arxiv.org/abs/2501.07458v1
- Date: Mon, 13 Jan 2025 16:28:01 GMT
- Title: Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI
- Authors: Rolf Pfister, Hansueli Jud,
- Abstract summary: OpenAI's o3 achieves a high score of 87.5 % on ARC-AGI, a benchmark proposed to measure intelligence.
This raises the question whether systems based on Large Language Models (LLMs), particularly o3, demonstrate intelligence and progress towards artificial general intelligence (AGI)
- Score: 0.0
- License:
- Abstract: OpenAI's o3 achieves a high score of 87.5 % on ARC-AGI, a benchmark proposed to measure intelligence. This raises the question whether systems based on Large Language Models (LLMs), particularly o3, demonstrate intelligence and progress towards artificial general intelligence (AGI). Building on the distinction between skills and intelligence made by Fran\c{c}ois Chollet, the creator of ARC-AGI, a new understanding of intelligence is introduced: an agent is the more intelligent, the more efficiently it can achieve the more diverse goals in the more diverse worlds with the less knowledge. An analysis of the ARC-AGI benchmark shows that its tasks represent a very specific type of problem that can be solved by massive trialling of combinations of predefined operations. This method is also applied by o3, achieving its high score through the extensive use of computing power. However, for most problems in the physical world and in the human domain, solutions cannot be tested in advance and predefined operations are not available. Consequently, massive trialling of predefined operations, as o3 does, cannot be a basis for AGI - instead, new approaches are required that can reliably solve a wide variety of problems without existing skills. To support this development, a new benchmark for intelligence is outlined that covers a much higher diversity of unknown tasks to be solved, thus enabling a comprehensive assessment of intelligence and of progress towards AGI.
Related papers
- Artificial Expert Intelligence through PAC-reasoning [21.91294369791479]
Artificial Expert Intelligence (AEI) seeks to transcend the limitations of both Artificial General Intelligence (AGI) and narrow AI.
AEI seeks to integrate domain-specific expertise with critical, precise reasoning capabilities akin to those of top human experts.
arXiv Detail & Related papers (2024-12-03T13:25:18Z) - Imagining and building wise machines: The centrality of AI metacognition [78.76893632793497]
We argue that shortcomings stem from one overarching failure: AI systems lack wisdom.
While AI research has focused on task-level strategies, metacognition is underdeveloped in AI systems.
We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety.
arXiv Detail & Related papers (2024-11-04T18:10:10Z) - Evaluation of OpenAI o1: Opportunities and Challenges of AGI [112.0812059747033]
o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance.
The model excelled in tasks requiring intricate reasoning and knowledge integration across various fields.
Overall results indicate significant progress towards artificial general intelligence.
arXiv Detail & Related papers (2024-09-27T06:57:00Z) - OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI [73.75520820608232]
We introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities.
These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage.
Our evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy, illustrating current AI limitations in complex reasoning and multimodal integration.
arXiv Detail & Related papers (2024-06-18T16:20:53Z) - General Purpose Artificial Intelligence Systems (GPAIS): Properties,
Definition, Taxonomy, Societal Implications and Responsible Governance [16.030931070783637]
General-Purpose Artificial Intelligence Systems (GPAIS) has been defined to refer to these AI systems.
To date, the possibility of an Artificial General Intelligence, powerful enough to perform any intellectual task as if it were human, or even improve it, has remained an aspiration, fiction, and considered a risk for our society.
This work discusses existing definitions for GPAIS and proposes a new definition that allows for a gradual differentiation among types of GPAIS according to their properties and limitations.
arXiv Detail & Related papers (2023-07-26T16:35:48Z) - OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks.
This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents.
We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z) - A User-Centred Framework for Explainable Artificial Intelligence in
Human-Robot Interaction [70.11080854486953]
We propose a user-centred framework for XAI that focuses on its social-interactive aspect.
The framework aims to provide a structure for interactive XAI solutions thought for non-expert users.
arXiv Detail & Related papers (2021-09-27T09:56:23Z) - Hybrid Intelligence [4.508830262248694]
We argue that the most likely paradigm for the division of labor between humans and machines in the next decades is Hybrid Intelligence.
This concept aims at using the complementary strengths of human intelligence and AI, so that they can perform better than each of the two could separately.
arXiv Detail & Related papers (2021-05-03T08:56:09Z) - Explainable Artificial Intelligence Approaches: A Survey [0.22940141855172028]
Lack of explainability of a decision from an Artificial Intelligence based "black box" system/model is a key stumbling block for adopting AI in high stakes applications.
We demonstrate popular Explainable Artificial Intelligence (XAI) methods with a mutual case study/task.
We analyze for competitive advantages from multiple perspectives.
We recommend paths towards responsible or human-centered AI using XAI as a medium.
arXiv Detail & Related papers (2021-01-23T06:15:34Z) - Future Trends for Human-AI Collaboration: A Comprehensive Taxonomy of
AI/AGI Using Multiple Intelligences and Learning Styles [95.58955174499371]
We describe various aspects of multiple human intelligences and learning styles, which may impact on a variety of AI problem domains.
Future AI systems will be able not only to communicate with human users and each other, but also to efficiently exchange knowledge and wisdom.
arXiv Detail & Related papers (2020-08-07T21:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.