Related papers: On the Undecidability of Artificial Intelligence Alignment: Machines that Halt

On the Undecidability of Artificial Intelligence Alignment: Machines that Halt

URL: http://arxiv.org/abs/2408.08995v1
Date: Fri, 16 Aug 2024 19:55:26 GMT
Title: On the Undecidability of Artificial Intelligence Alignment: Machines that Halt
Authors: Gabriel Adriano de Melo, Marcos Ricardo Omena De Albuquerque Maximo, Nei Yoshihiro Soma, Paulo Andre Lima de Castro,
Abstract summary: The inner alignment problem asserts whether an arbitrary artificial intelligence model satisfices a non-trivial alignment function of its outputs given its inputs, but is undecidable. We argue that the alignment should be a guaranteed property from the AI architecture rather than a characteristic imposed post-hoc on an arbitrary AI model. We propose that such a function must also impose a halting constraint that guarantees that the AI model always reaches a terminal state in finite execution steps.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The inner alignment problem, which asserts whether an arbitrary artificial intelligence (AI) model satisfices a non-trivial alignment function of its outputs given its inputs, is undecidable. This is rigorously proved by Rice's theorem, which is also equivalent to a reduction to Turing's Halting Problem, whose proof sketch is presented in this work. Nevertheless, there is an enumerable set of provenly aligned AIs that are constructed from a finite set of provenly aligned operations. Therefore, we argue that the alignment should be a guaranteed property from the AI architecture rather than a characteristic imposed post-hoc on an arbitrary AI model. Furthermore, while the outer alignment problem is the definition of a judge function that captures human values and preferences, we propose that such a function must also impose a halting constraint that guarantees that the AI model always reaches a terminal state in finite execution steps. Our work presents examples and models that illustrate this constraint and the intricate challenges involved, advancing a compelling case for adopting an intrinsically hard-aligned approach to AI systems architectures that ensures halting.

Related papers

Alpay Algebra IV: Symbiotic Semantics and the Fixed-Point Convergence of Observer Embeddings [0.0]
We present a theoretical framework in which a document and an AI model engage in a transfinite fixed-point interaction.<n>We prove that such convergence is mathematically sound, semantically invariant, and permanent.<n>This fixed point acts as an "empathetic embedding," wherein the AI internalizes not only the meaning of the content but also the author's intent.
arXiv Detail & Related papers (2025-07-04T18:49:18Z)
A Conjecture on a Fundamental Trade-Off between Certainty and Scope in Symbolic and Generative AI [0.0]
Article introduces a conjecture that formalises a fundamental trade-off between provable correctness and broad data-mapping capacity in AI systems.<n>By making this implicit trade-off explicit and open to rigorous verification, the conjecture significantly reframes both engineering ambitions and philosophical expectations for AI.
arXiv Detail & Related papers (2025-06-11T19:18:13Z)
Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex. Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs. We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z)
Relational Neurosymbolic Markov Models [13.22004615196798]
Sequential problems are ubiquitous in AI, such as in reinforcement learning or natural language processing. We introduce neurosymbolic AI (NeSy) which provides a sound formalism to enforce constraints in deep probabilistic models but scales exponentially on sequential problems. We propose a strategy for inference and learning that scales on sequential settings, and that combines approximate Bayesian inference, automated reasoning, and gradient estimation.
arXiv Detail & Related papers (2024-12-17T15:41:51Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Adaptation of XAI to Auto-tuning for Numerical Libraries [0.0]
Explainable AI (XAI) technology is gaining prominence, aiming to streamline AI model development and alleviate the burden of explaining AI outputs to users. This research focuses on XAI for AI models when integrated into two different processes for practical numerical computations.
arXiv Detail & Related papers (2024-05-12T09:00:56Z)
Scalable AI Safety via Doubly-Efficient Debate [37.25328923531058]
The emergence of pre-trained AI systems with powerful capabilities has raised a critical challenge for AI safety. The original framework was based on the assumption that honest strategy is able to simulate AI systems for an exponential number of steps. We show how to address these challenges by designing a new set of protocols.
arXiv Detail & Related papers (2023-11-23T17:46:30Z)
Oracle Computability and Turing Reducibility in the Calculus of Inductive Constructions [0.0]
We develop synthetic notions of oracle computability and Turing reducibility in the Calculus of Inductive Constructions. As usual in synthetic approaches, we employ a definition of oracle computations based on meta-level functions. We show that Turing reducibility forms an upper semilattice, transports decidability, and is strictly more expressive than truth-table reducibility.
arXiv Detail & Related papers (2023-07-28T13:16:46Z)
On Formal Feature Attribution and Its Approximation [37.3078859524959]
This paper proposes a way to apply the apparatus of formal XAI to the case of feature attribution based on formal explanation enumeration. Given the practical complexity of the problem, the paper then proposes an efficient technique for approximating exact FFA.
arXiv Detail & Related papers (2023-07-07T04:20:36Z)
Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points. Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z)
Symmetric Tensor Networks for Generative Modeling and Constrained Combinatorial Optimization [72.41480594026815]
Constrained optimization problems abound in industry, from portfolio optimization to logistics. One of the major roadblocks in solving these problems is the presence of non-trivial hard constraints which limit the valid search space. In this work, we encode arbitrary integer-valued equality constraints of the form Ax=b, directly into U(1) symmetric networks (TNs) and leverage their applicability as quantum-inspired generative models.
arXiv Detail & Related papers (2022-11-16T18:59:54Z)
On The Computational Complexity of Self-Attention [22.3395465641384]
Modern transformers rely on the self-attention mechanism, whose time- and space-complexity is quadratic in the length of the input. We prove that the time complexity of self-attention is necessarily quadratic in the input length, unless the Strong Exponential Time Hypothesis (SETH) is false. As a complement to our lower bounds, we show that it is indeed possible to approximate dot-product self-attention using finite Taylor series in linear-time.
arXiv Detail & Related papers (2022-09-11T14:38:10Z)
Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection [136.4014229319618]
We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure. We first derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant regret in any MDP with linear reward function.
arXiv Detail & Related papers (2021-10-27T22:07:08Z)
High-dimensional separability for one- and few-shot learning [58.8599521537]
This work is driven by a practical question, corrections of Artificial Intelligence (AI) errors. Special external devices, correctors, are developed. They should provide quick and non-iterative system fix without modification of a legacy AI system. New multi-correctors of AI systems are presented and illustrated with examples of predicting errors and learning new classes of objects by a deep convolutional neural network.
arXiv Detail & Related papers (2021-06-28T14:58:14Z)
Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome. Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations. We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.