Related papers: Contemplative Wisdom for Superalignment

Contemplative Wisdom for Superalignment

URL: http://arxiv.org/abs/2504.15125v1
Date: Mon, 21 Apr 2025 14:20:49 GMT
Title: Contemplative Wisdom for Superalignment
Authors: Ruben Laukkonen, Fionn Inglis, Shamil Chandaria, Lars Sandved-Smith, Jakob Hohwy, Jonathan Gold, Adam Elwood,
Abstract summary: We advocate designing AI with intrinsic morality built into its cognitive architecture and world model.<n>Inspired by contemplative wisdom traditions, we show how four axiomatic principles can instil a resilient Wise World Model in AI systems.
Score: 1.7143967091323253
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As artificial intelligence (AI) improves, traditional alignment strategies may falter in the face of unpredictable self-improvement, hidden subgoals, and the sheer complexity of intelligent systems. Rather than externally constraining behavior, we advocate designing AI with intrinsic morality built into its cognitive architecture and world model. Inspired by contemplative wisdom traditions, we show how four axiomatic principles can instil a resilient Wise World Model in AI systems. First, mindfulness enables self-monitoring and recalibration of emergent subgoals. Second, emptiness forestalls dogmatic goal fixation and relaxes rigid priors. Third, non-duality dissolves adversarial self-other boundaries. Fourth, boundless care motivates the universal reduction of suffering. We find that prompting AI to reflect on these principles improves performance on the AILuminate Benchmark using GPT-4o, particularly when combined. We offer detailed implementation strategies for state-of-the-art models, including contemplative architectures, constitutions, and reinforcement of chain-of-thought. For future systems, the active inference framework may offer the self-organizing and dynamic coupling capabilities needed to enact these insights in embodied agents. This interdisciplinary approach offers a self-correcting and resilient alternative to prevailing brittle control schemes.

Related papers

Intention-Guided Cognitive Reasoning for Egocentric Long-Term Action Anticipation [52.6091162517921]
INSIGHT is a two-stage framework for egocentric action anticipation.<n>In the first stage, INSIGHT focuses on extracting semantically rich features from hand-object interaction regions.<n>In the second stage, it introduces a reinforcement learning-based module that simulates explicit cognitive reasoning.
arXiv Detail & Related papers (2025-08-03T12:52:27Z)
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact [27.722167796617114]
This paper offers a cross-disciplinary synthesis of artificial intelligence, cognitive neuroscience, psychology, generative models, and agent-based systems.<n>We analyze the architectural and cognitive foundations of general intelligence, highlighting the role of modular reasoning, persistent memory, and multi-agent coordination.<n>We identify key scientific, technical, and ethical challenges on the path to Artificial General Intelligence.
arXiv Detail & Related papers (2025-07-01T16:52:25Z)
Rational Superautotrophic Diplomacy (SupraAD); A Conceptual Framework for Alignment Based on Interdisciplinary Findings on the Fundamentals of Cognition [0.0]
Rational Superautotrophic Diplomacy (SupraAD) is a theoretical, interdisciplinary conceptual framework for alignment.<n>It draws on cognitive systems analysis and instrumental rationality modeling.<n>SupraAD reframes alignment as a challenge that predates AI, afflicting all sufficiently complex, coadapting intelligences.
arXiv Detail & Related papers (2025-06-03T17:28:25Z)
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems [133.45145180645537]
The advent of large language models (LLMs) has catalyzed a transformative shift in artificial intelligence.<n>As these agents increasingly drive AI research and practical applications, their design, evaluation, and continuous improvement present intricate, multifaceted challenges.<n>This survey provides a comprehensive overview, framing intelligent agents within a modular, brain-inspired architecture.
arXiv Detail & Related papers (2025-03-31T18:00:29Z)
Universal AI maximizes Variational Empowerment [0.0]
We build on the existing framework of Self-AIXI -- a universal learning agent that predicts its own actions.<n>We argue that power-seeking tendencies of universal AI agents can be explained as an instrumental strategy to secure future reward.<n>Our main contribution is to show how these motivations systematically lead universal AI agents to seek and sustain high-optionality states.
arXiv Detail & Related papers (2025-02-20T02:58:44Z)
Emergence of Self-Awareness in Artificial Systems: A Minimalist Three-Layer Approach to Artificial Consciousness [0.0]
This paper proposes a minimalist three-layer model for artificial consciousness, focusing on the emergence of self-awareness. Unlike brain-replication approaches, we aim to achieve minimal self-awareness through essential elements only.
arXiv Detail & Related papers (2025-02-04T10:06:25Z)
AI Ethics by Design: Implementing Customizable Guardrails for Responsible AI Development [0.0]
We propose a structure that integrates rules, policies, and AI assistants to ensure responsible AI behavior.<n>Our approach accommodates ethical pluralism, offering a flexible and adaptable solution for the evolving landscape of AI governance.
arXiv Detail & Related papers (2024-11-05T18:38:30Z)
Imagining and building wise machines: The centrality of AI metacognition [78.76893632793497]
We argue that shortcomings stem from one overarching failure: AI systems lack wisdom. While AI research has focused on task-level strategies, metacognition is underdeveloped in AI systems. We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety.
arXiv Detail & Related papers (2024-11-04T18:10:10Z)
Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act) Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence. As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z)
A call for embodied AI [1.7544885995294304]
We propose Embodied AI as the next fundamental step in the pursuit of Artificial General Intelligence. By broadening the scope of Embodied AI, we introduce a theoretical framework based on cognitive architectures. This framework is aligned with Friston's active inference principle, offering a comprehensive approach to EAI development.
arXiv Detail & Related papers (2024-02-06T09:11:20Z)
Hybrid Approaches for Moral Value Alignment in AI Agents: a Manifesto [3.7414804164475983]
Increasing interest in ensuring the safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents.<n>We provide a systematization of existing approaches to the problem of introducing morality in machines - modelled as a continuum.<n>We argue that more hybrid solutions are needed to create adaptable and robust, yet controllable and interpretable agentic systems.
arXiv Detail & Related papers (2023-12-04T11:46:34Z)
AI Alignment: A Comprehensive Survey [69.61425542486275]
AI alignment aims to make AI systems behave in line with human intentions and values.<n>We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality.<n>We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z)
The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z)
An interdisciplinary conceptual study of Artificial Intelligence (AI) for helping benefit-risk assessment practices: Towards a comprehensive qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence. The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z)
Distributed and Democratized Learning: Philosophy and Research Challenges [80.39805582015133]
We propose a novel design philosophy called democratized learning (Dem-AI) Inspired by the societal groups of humans, the specialized groups of learning agents in the proposed Dem-AI system are self-organized in a hierarchical structure to collectively perform learning tasks more efficiently. We present a reference design as a guideline to realize future Dem-AI systems, inspired by various interdisciplinary fields.
arXiv Detail & Related papers (2020-03-18T08:45:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.