Related papers: Superficial Consciousness Hypothesis for Autoregressive Transformers

Superficial Consciousness Hypothesis for Autoregressive Transformers

URL: http://arxiv.org/abs/2412.07278v1
Date: Tue, 10 Dec 2024 08:08:17 GMT
Title: Superficial Consciousness Hypothesis for Autoregressive Transformers
Authors: Yosuke Miyanishi, Keita Mitani,
Abstract summary: Superintelligence (SI) is assumed to be more intelligent than humans, making output-based analysis unreliable.<n>We propose the Superficial Consciousness Hypothesis under Information Integration Theory (IIT)<n>We show that a practical estimate of IIT's consciousness metric is relevant to the widely used perplexity metric, and train GPT-2 with those two objectives.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The alignment between human objectives and machine learning models built on these objectives is a crucial yet challenging problem for achieving Trustworthy AI, particularly when preparing for superintelligence (SI). First, given that SI does not exist today, empirical analysis for direct evidence is difficult. Second, SI is assumed to be more intelligent than humans, capable of deceiving us into underestimating its intelligence, making output-based analysis unreliable. Lastly, what kind of unexpected property SI might have is still unclear. To address these challenges, we propose the Superficial Consciousness Hypothesis under Information Integration Theory (IIT), suggesting that SI could exhibit a complex information-theoretic state like a conscious agent while unconscious. To validate this, we use a hypothetical scenario where SI can update its parameters "at will" to achieve its own objective (mesa-objective) under the constraint of the human objective (base objective). We show that a practical estimate of IIT's consciousness metric is relevant to the widely used perplexity metric, and train GPT-2 with those two objectives. Our preliminary result suggests that this SI-simulating GPT-2 could simultaneously follow the two objectives, supporting the feasibility of the Superficial Consciousness Hypothesis.

Related papers

Analyzing Advanced AI Systems Against Definitions of Life and Consciousness [0.0]
We propose a number of metrics for examining whether an advanced AI system has gained consciousness. We suggest that sufficiently advanced architectures exhibiting immune like sabotage defenses, mirror self-recognition analogs, or meta-cognitive updates may cross key thresholds akin to life-like or consciousness-like traits.
arXiv Detail & Related papers (2025-02-07T15:27:34Z)
Towards A Litmus Test for Common Sense [5.280511830552275]
This paper is the second in a planned series aimed at envisioning a path to safe and beneficial artificial intelligence. We propose a more formal litmus test for common sense, adopting an axiomatic approach that combines minimal prior knowledge constraints with diagonal or Godel-style arguments.
arXiv Detail & Related papers (2025-01-17T02:02:12Z)
Emergence of Self-Identity in AI: A Mathematical Framework and Empirical Study with Generative Large Language Models [4.036530158875673]
This paper introduces a mathematical framework for defining and quantifying self-identity in AI systems.<n>Our framework posits that self-identity emerges from two mathematically quantifiable conditions.<n>The implications of our study are immediately relevant to the fields of humanoid robotics and autonomous systems.
arXiv Detail & Related papers (2024-11-27T17:23:47Z)
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI [129.08019405056262]
Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial Intelligence (AGI) MLMs andWMs have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI.
arXiv Detail & Related papers (2024-07-09T14:14:47Z)
Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial General Intelligence [4.901955678857442]
We posited the existence of critical points, akin to phase transitions in complex systems, where AI performance might plateau or regress into instability upon exceeding a critical complexity threshold. Our simulations demonstrated how increasing the complexity of the AI system could exceed an upper criticality threshold, leading to unpredictable performance behaviours.
arXiv Detail & Related papers (2024-07-04T05:46:39Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making. We present a process-based benchmark MR-Ben that demands a meta-reasoning skill. Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Self-Distilled Disentangled Learning for Counterfactual Prediction [49.84163147971955]
We propose the Self-Distilled Disentanglement framework, known as $SD2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs. Our experiments, conducted on both synthetic and real-world datasets, confirm the effectiveness of our approach.
arXiv Detail & Related papers (2024-06-09T16:58:19Z)
Evaluating General-Purpose AI with Psychometrics [43.85432514910491]
We discuss the need for a comprehensive and accurate evaluation of general-purpose AI systems such as large language models. Current evaluation methodology, mostly based on benchmarks of specific tasks, falls short of adequately assessing these versatile AI systems. To tackle these challenges, we suggest transitioning from task-oriented evaluation to construct-oriented evaluation.
arXiv Detail & Related papers (2023-10-25T05:38:38Z)
Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models [83.63242931107638]
We propose four characteristics of generally intelligent agents. We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations. We conclude by outlining promising future research directions in the field of artificial general intelligence.
arXiv Detail & Related papers (2023-07-07T13:58:16Z)
Why not both? Complementing explanations with uncertainty, and the role of self-confidence in Human-AI collaboration [12.47276164048813]
We conduct an empirical study to identify how uncertainty estimates and model explanations affect users' reliance, understanding, and trust towards a model. We also discuss how the latter may distort the outcome of an analysis based on agreement and switching percentages.
arXiv Detail & Related papers (2023-04-27T12:24:33Z)
Physical Adversarial Attack meets Computer Vision: A Decade Survey [55.38113802311365]
This paper presents a comprehensive overview of physical adversarial attacks. We take the first step to systematically evaluate the performance of physical adversarial attacks. Our proposed evaluation metric, hiPAA, comprises six perspectives.
arXiv Detail & Related papers (2022-09-30T01:59:53Z)
An Objective Metric for Explainable AI: How and Why to Estimate the Degree of Explainability [3.04585143845864]
We present a new model-agnostic metric to measure the Degree of eXplainability of correct information in an objective way. We designed a few experiments and a user-study on two realistic AI-based systems for healthcare and finance.
arXiv Detail & Related papers (2021-09-11T17:44:13Z)
Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome. Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations. We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.