Related papers: SuperARC: An Agnostic Test for Narrow, General, and Super Intelligence Based On the Principles of Recursive Compression and Algorithmic Probability

SuperARC: An Agnostic Test for Narrow, General, and Super Intelligence Based On the Principles of Recursive Compression and Algorithmic Probability

URL: http://arxiv.org/abs/2503.16743v3
Date: Tue, 22 Apr 2025 22:30:20 GMT
Title: SuperARC: An Agnostic Test for Narrow, General, and Super Intelligence Based On the Principles of Recursive Compression and Algorithmic Probability
Authors: Alberto Hernández-Espinosa, Luan Ozelim, Felipe S. Abrahão, Hector Zenil,
Abstract summary: We introduce an open-ended test grounded in algorithmic probability.<n>It can avoid benchmark contamination in the quantitative evaluation of frontier models.<n>We prove that compression is equivalent and directly proportional to a system's predictive power.
Score: 0.14061979259370275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce an open-ended test grounded in algorithmic probability that can avoid benchmark contamination in the quantitative evaluation of frontier models in the context of their Artificial General Intelligence (AGI) and Superintelligence (ASI) claims. Unlike other tests, this test does not rely on statistical compression methods (such as GZIP or LZW), which are more closely related to Shannon entropy than to Kolmogorov complexity and are not able to test beyond simple pattern matching. The test challenges aspects of AI, in particular LLMs, related to features of intelligence of fundamental nature such as synthesis and model creation in the context of inverse problems (generating new knowledge from observation). We argue that metrics based on model abstraction and abduction (optimal Bayesian `inference') for predictive `planning' can provide a robust framework for testing intelligence, including natural intelligence (human and animal), narrow AI, AGI, and ASI. We found that LLM model versions tend to be fragile and incremental as a result of memorisation only with progress likely driven by the size of training data. The results were compared with a hybrid neurosymbolic approach that theoretically guarantees universal intelligence based on the principles of algorithmic probability and Kolmogorov complexity. The method outperforms LLMs in a proof-of-concept on short binary sequences. We prove that compression is equivalent and directly proportional to a system's predictive power and vice versa. That is, if a system can better predict it can better compress, and if it can better compress, then it can better predict. Our findings strengthen the suspicion regarding the fundamental limitations of LLMs, exposing them as systems optimised for the perception of mastery over human language.

Related papers

Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders [50.52694757593443]
Existing SAE training algorithms often lack rigorous mathematical guarantees and suffer from practical limitations.<n>We first propose a novel statistical framework for the feature recovery problem, which includes a new notion of feature identifiability.<n>We introduce a new SAE training algorithm based on bias adaptation'', a technique that adaptively adjusts neural network bias parameters to ensure appropriate activation sparsity.
arXiv Detail & Related papers (2025-06-16T20:58:05Z)
AI-Powered Bayesian Inference [0.0]
Generative Artificial Intelligence (GAI) has heralded an inflection point that changed how society thinks about knowledge acquisition.<n>While GAI cannot be fully trusted for decision-making, it may still provide valuable information that can be integrated into a decision pipeline.<n> variable answers to given prompts can be leveraged to construct a prior distribution which reflects assuredness of AI predictions.
arXiv Detail & Related papers (2025-02-26T15:42:06Z)
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models [76.6028674686018]
We introduce thought-tracing, an inference-time reasoning algorithm to trace the mental states of agents.<n>Our algorithm is modeled after the Bayesian theory-of-mind framework.<n>We evaluate thought-tracing on diverse theory-of-mind benchmarks, demonstrating significant performance improvements.
arXiv Detail & Related papers (2025-02-17T15:08:50Z)
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation [24.081573908824353]
First-order logic (FOL) reasoning is pivotal for intelligent systems.<n>Existing benchmarks often rely on extensive human annotation or handcrafted templates.<n>We propose a novel framework called ProverGen that synergizes the generative strengths of Large Language Models with the rigor and precision of symbolic provers.
arXiv Detail & Related papers (2025-02-10T15:31:54Z)
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.<n>We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.<n>We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z)
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two principled algorithms for the test-time compute of large language models. We prove theoretically that the failure probability of one algorithm decays to zero exponentially as its test-time compute grows.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model [69.08287909042421]
We show that OpenAI's o1 model has achieved the best performance on most datasets. We also provide a detailed analysis on several reasoning benchmarks.
arXiv Detail & Related papers (2024-10-17T15:09:03Z)
Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial General Intelligence [4.901955678857442]
We posited the existence of critical points, akin to phase transitions in complex systems, where AI performance might plateau or regress into instability upon exceeding a critical complexity threshold. Our simulations demonstrated how increasing the complexity of the AI system could exceed an upper criticality threshold, leading to unpredictable performance behaviours.
arXiv Detail & Related papers (2024-07-04T05:46:39Z)
SynthTree: Co-supervised Local Model Synthesis for Explainable Prediction [15.832975722301011]
We propose a novel method to enhance explainability with minimal accuracy loss. We have developed novel methods for estimating nodes by leveraging AI techniques. Our findings highlight the critical role that statistical methodologies can play in advancing explainable AI.
arXiv Detail & Related papers (2024-06-16T14:43:01Z)
Precise Error Rates for Computationally Efficient Testing [67.30044609837749]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity. An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z)
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering [59.63860993280275]
Large Language Models (LLMs) have demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. We propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals. Our approach achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18%.
arXiv Detail & Related papers (2023-05-05T11:56:30Z)
Learnability, Sample Complexity, and Hypothesis Class Complexity for Regression Models [10.66048003460524]
This work is inspired by the foundation of PAC and is motivated by the existing regression learning issues. The proposed approach, denoted by epsilon-Confidence Approximately Correct (epsilon CoAC), utilizes Kullback Leibler divergence (relative entropy) It enables the learner to compare hypothesis classes of different complexity orders and choose among them the optimum with the minimum epsilon.
arXiv Detail & Related papers (2023-03-28T15:59:12Z)
Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation. We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem. Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z)
Great Truths are Always Simple: A Rather Simple Knowledge Encoder for Enhancing the Commonsense Reasoning Capacity of Pre-Trained Models [89.98762327725112]
Commonsense reasoning in natural language is a desired ability of artificial intelligent systems. For solving complex commonsense reasoning tasks, a typical solution is to enhance pre-trained language models(PTMs) with a knowledge-aware graph neural network(GNN) encoder. Despite the effectiveness, these approaches are built on heavy architectures, and can't clearly explain how external knowledge resources improve the reasoning capacity of PTMs.
arXiv Detail & Related papers (2022-05-04T01:27:36Z)
Parsimonious Inference [0.0]
Parsimonious inference is an information-theoretic formulation of inference over arbitrary architectures. Our approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation.
arXiv Detail & Related papers (2021-03-03T04:13:14Z)
Learning Reasoning Strategies in End-to-End Differentiable Proving [50.9791149533921]
Conditional Theorem Provers learn optimal rule selection strategy via gradient-based optimisation. We show that Conditional Theorem Provers are scalable and yield state-of-the-art results on the CLUTRR dataset.
arXiv Detail & Related papers (2020-07-13T16:22:14Z)
Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms. Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.