Superficial Consciousness Hypothesis for Autoregressive Transformers
- URL: http://arxiv.org/abs/2412.07278v1
- Date: Tue, 10 Dec 2024 08:08:17 GMT
- Title: Superficial Consciousness Hypothesis for Autoregressive Transformers
- Authors: Yosuke Miyanishi, Keita Mitani,
- Abstract summary: Superintelligence (SI) is assumed to be more intelligent than humans, making output-based analysis unreliable.<n>We propose the Superficial Consciousness Hypothesis under Information Integration Theory (IIT)<n>We show that a practical estimate of IIT's consciousness metric is relevant to the widely used perplexity metric, and train GPT-2 with those two objectives.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The alignment between human objectives and machine learning models built on these objectives is a crucial yet challenging problem for achieving Trustworthy AI, particularly when preparing for superintelligence (SI). First, given that SI does not exist today, empirical analysis for direct evidence is difficult. Second, SI is assumed to be more intelligent than humans, capable of deceiving us into underestimating its intelligence, making output-based analysis unreliable. Lastly, what kind of unexpected property SI might have is still unclear. To address these challenges, we propose the Superficial Consciousness Hypothesis under Information Integration Theory (IIT), suggesting that SI could exhibit a complex information-theoretic state like a conscious agent while unconscious. To validate this, we use a hypothetical scenario where SI can update its parameters "at will" to achieve its own objective (mesa-objective) under the constraint of the human objective (base objective). We show that a practical estimate of IIT's consciousness metric is relevant to the widely used perplexity metric, and train GPT-2 with those two objectives. Our preliminary result suggests that this SI-simulating GPT-2 could simultaneously follow the two objectives, supporting the feasibility of the Superficial Consciousness Hypothesis.
Related papers
- A Mathematical Theory of Agency and Intelligence [0.0]
We show how much of the total information a system deploys is actually shared between its observations, actions, and outcomes.<n>We prove this shared fraction, which we term bipredictability, P, is intrinsic to any interaction, derivable from first principles.<n>We demonstrate a feedback architecture that monitors P in real time, establishing a prerequisite for adaptive, resilient AI.
arXiv Detail & Related papers (2026-02-26T01:26:21Z) - What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding [50.35012849818872]
Large language model (LLM) agents have demonstrated remarkable capabilities in complex decision-making and tool-use tasks.<n>We propose Task-to-Quiz (T2Q), a deterministic and automated evaluation paradigm designed to decouple task execution from world-state understanding.<n>Our experiments reveal that task success is often a poor proxy for environment understanding, and that current memory machanism can not effectively help agents acquire a grounded model of the environment.
arXiv Detail & Related papers (2026-01-14T14:09:11Z) - EscherVerse: An Open World Benchmark and Dataset for Teleo-Spatial Intelligence with Physical-Dynamic and Intent-Driven Understanding [56.89359230139883]
We introduce Teleo-Spatial Intelligence (TSI), a new paradigm that unifies two critical pillars: Physical-Dynamic Reasoning and Intent-Driven Reasoning.<n>We present EscherVerse, consisting of a large-scale, open-world benchmark (Escher-Bench), a dataset (Escher-35k), and models (Escher series)<n>It is the first benchmark to systematically assess Intent-Driven Reasoning, challenging models to connect physical events to their underlying human purposes.
arXiv Detail & Related papers (2026-01-04T14:42:39Z) - Holistic Evaluation of Multimodal LLMs on Spatial Intelligence [81.2547965083228]
We propose EASI for holistic Evaluation of multimodAl LLMs on Spatial Intelligence.<n>We conduct the study across eight key benchmarks, at a cost exceeding ten billion total tokens.<n>Our empirical study then reveals that GPT-5 demonstrates unprecedented strength in spatial intelligence (SI), yet (2) still falls short of human performance significantly across a broad spectrum of SI-tasks.
arXiv Detail & Related papers (2025-08-18T17:55:17Z) - The Traitors: Deception and Trust in Multi-Agent Language Model Simulations [0.0]
We introduce The Traitors, a multi-agent simulation framework inspired by social deduction games.<n>We develop a suite of evaluation metrics capturing deception success, trust dynamics, and collective inference quality.<n>Our initial experiments across DeepSeek-V3, GPT-4o-mini, and GPT-4o (10 runs per model) reveal a notable asymmetry.
arXiv Detail & Related papers (2025-05-19T10:01:35Z) - Analyzing Advanced AI Systems Against Definitions of Life and Consciousness [0.0]
We propose a number of metrics for examining whether an advanced AI system has gained consciousness.
We suggest that sufficiently advanced architectures exhibiting immune like sabotage defenses, mirror self-recognition analogs, or meta-cognitive updates may cross key thresholds akin to life-like or consciousness-like traits.
arXiv Detail & Related papers (2025-02-07T15:27:34Z) - Towards A Litmus Test for Common Sense [5.280511830552275]
This paper is the second in a planned series aimed at envisioning a path to safe and beneficial artificial intelligence.
We propose a more formal litmus test for common sense, adopting an axiomatic approach that combines minimal prior knowledge constraints with diagonal or Godel-style arguments.
arXiv Detail & Related papers (2025-01-17T02:02:12Z) - Emergence of Self-Identity in AI: A Mathematical Framework and Empirical Study with Generative Large Language Models [4.036530158875673]
This paper introduces a mathematical framework for defining and quantifying self-identity in AI systems.<n>Our framework posits that self-identity emerges from two mathematically quantifiable conditions.<n>The implications of our study are immediately relevant to the fields of humanoid robotics and autonomous systems.
arXiv Detail & Related papers (2024-11-27T17:23:47Z) - Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI [129.08019405056262]
Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial Intelligence (AGI)
MLMs andWMs have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities.
In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI.
arXiv Detail & Related papers (2024-07-09T14:14:47Z) - Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial General Intelligence [4.901955678857442]
We posited the existence of critical points, akin to phase transitions in complex systems, where AI performance might plateau or regress into instability upon exceeding a critical complexity threshold.
Our simulations demonstrated how increasing the complexity of the AI system could exceed an upper criticality threshold, leading to unpredictable performance behaviours.
arXiv Detail & Related papers (2024-07-04T05:46:39Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - Self-Distilled Disentangled Learning for Counterfactual Prediction [49.84163147971955]
We propose the Self-Distilled Disentanglement framework, known as $SD2$.
Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs.
Our experiments, conducted on both synthetic and real-world datasets, confirm the effectiveness of our approach.
arXiv Detail & Related papers (2024-06-09T16:58:19Z) - Evaluating General-Purpose AI with Psychometrics [43.85432514910491]
We discuss the need for a comprehensive and accurate evaluation of general-purpose AI systems such as large language models.
Current evaluation methodology, mostly based on benchmarks of specific tasks, falls short of adequately assessing these versatile AI systems.
To tackle these challenges, we suggest transitioning from task-oriented evaluation to construct-oriented evaluation.
arXiv Detail & Related papers (2023-10-25T05:38:38Z) - Brain in a Vat: On Missing Pieces Towards Artificial General
Intelligence in Large Language Models [83.63242931107638]
We propose four characteristics of generally intelligent agents.
We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations.
We conclude by outlining promising future research directions in the field of artificial general intelligence.
arXiv Detail & Related papers (2023-07-07T13:58:16Z) - Why not both? Complementing explanations with uncertainty, and the role
of self-confidence in Human-AI collaboration [12.47276164048813]
We conduct an empirical study to identify how uncertainty estimates and model explanations affect users' reliance, understanding, and trust towards a model.
We also discuss how the latter may distort the outcome of an analysis based on agreement and switching percentages.
arXiv Detail & Related papers (2023-04-27T12:24:33Z) - Physical Adversarial Attack meets Computer Vision: A Decade Survey [55.38113802311365]
This paper presents a comprehensive overview of physical adversarial attacks.
We take the first step to systematically evaluate the performance of physical adversarial attacks.
Our proposed evaluation metric, hiPAA, comprises six perspectives.
arXiv Detail & Related papers (2022-09-30T01:59:53Z) - An Objective Metric for Explainable AI: How and Why to Estimate the
Degree of Explainability [3.04585143845864]
We present a new model-agnostic metric to measure the Degree of eXplainability of correct information in an objective way.
We designed a few experiments and a user-study on two realistic AI-based systems for healthcare and finance.
arXiv Detail & Related papers (2021-09-11T17:44:13Z) - Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome.
Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations.
We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.