Related papers: Out of Control -- Why Alignment Needs Formal Control Theory (and an Alignment Control Stack)

Out of Control -- Why Alignment Needs Formal Control Theory (and an Alignment Control Stack)

URL: http://arxiv.org/abs/2506.17846v1
Date: Sat, 21 Jun 2025 22:45:19 GMT
Title: Out of Control -- Why Alignment Needs Formal Control Theory (and an Alignment Control Stack)
Authors: Elija Perrier,
Abstract summary: This position paper argues that formal optimal control theory should be central to AI alignment research.<n>It offers a distinct perspective from prevailing AI safety and security approaches.
Score: 0.6526824510982799
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This position paper argues that formal optimal control theory should be central to AI alignment research, offering a distinct perspective from prevailing AI safety and security approaches. While recent work in AI safety and mechanistic interpretability has advanced formal methods for alignment, they often fall short of the generalisation required of control frameworks for other technologies. There is also a lack of research into how to render different alignment/control protocols interoperable. We argue that by recasting alignment through principles of formal optimal control and framing alignment in terms of hierarchical stack from physical to socio-technical layers according to which controls may be applied we can develop a better understanding of the potential and limitations for controlling frontier models and agentic AI systems. To this end, we introduce an Alignment Control Stack which sets out a hierarchical layered alignment stack, identifying measurement and control characteristics at each layer and how different layers are formally interoperable. We argue that such analysis is also key to the assurances that will be needed by governments and regulators in order to see AI technologies sustainably benefit the community. Our position is that doing so will bridge the well-established and empirically validated methods of optimal control with practical deployment considerations to create a more comprehensive alignment framework, enhancing how we approach safety and reliability for advanced AI systems.

Related papers

Limits of Safe AI Deployment: Differentiating Oversight and Control [0.0]
Oversight and control (collectively, supervision) are often invoked as key levers for ensuring that AI systems are accountable, reliable, and able to fulfill governance and management requirements.<n>The concepts are frequently conflated or insufficiently distinguished in academic and policy discourse, undermining efforts to design or evaluate systems that should remain under meaningful human supervision.<n>This paper proposes a theoretically-informed yet policy-grounded framework that articulates the conditions under which each mechanism is possible, where they fall short, and what is required to make them meaningful in practice.
arXiv Detail & Related papers (2025-07-04T12:22:35Z)
Rational Superautotrophic Diplomacy (SupraAD); A Conceptual Framework for Alignment Based on Interdisciplinary Findings on the Fundamentals of Cognition [0.0]
Rational Superautotrophic Diplomacy (SupraAD) is a theoretical, interdisciplinary conceptual framework for alignment.<n>It draws on cognitive systems analysis and instrumental rationality modeling.<n>SupraAD reframes alignment as a challenge that predates AI, afflicting all sufficiently complex, coadapting intelligences.
arXiv Detail & Related papers (2025-06-03T17:28:25Z)
Explainable AI Systems Must Be Contestable: Here's How to Make It Happen [2.5875936082584623]
This paper presents the first rigorous formal definition of contestability in explainable AI.<n>We introduce a modular framework of by-design and post-hoc mechanisms spanning human-centered interfaces, technical processes, and organizational architectures.<n>Our work equips practitioners with the tools to embed genuine recourse and accountability into AI systems.
arXiv Detail & Related papers (2025-06-02T13:32:05Z)
Human-AI Governance (HAIG): A Trust-Utility Approach [0.0]
This paper introduces the HAIG framework for analysing trust dynamics across evolving human-AI relationships.<n>Our analysis reveals how technical advances in self-supervision, reasoning authority, and distributed decision-making drive non-uniform trust evolution.
arXiv Detail & Related papers (2025-05-03T01:57:08Z)
Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act) Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence. As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z)
Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous Robot Skills [10.43221469116584]
We propose Meta-Control, which creates customized state representations and control strategies tailored to specific tasks.<n>Our core insight is that a meta-control system can be built to automate the thought process that human experts use to design control systems.
arXiv Detail & Related papers (2024-05-18T19:58:44Z)
AI Alignment: A Comprehensive Survey [69.61425542486275]
AI alignment aims to make AI systems behave in line with human intentions and values.<n>We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality.<n>We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z)
A General Framework for Verification and Control of Dynamical Models via Certificate Synthesis [54.959571890098786]
We provide a framework to encode system specifications and define corresponding certificates. We present an automated approach to formally synthesise controllers and certificates. Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks.
arXiv Detail & Related papers (2023-09-12T09:37:26Z)
Actor-Critic based Improper Reinforcement Learning [61.430513757337486]
We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process. We propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic scheme and a Natural Actor-Critic scheme.
arXiv Detail & Related papers (2022-07-19T05:55:02Z)
Probabilistic Control and Majorization of Optimal Control [3.2634122554914002]
Probabilistic control design is founded on the principle that a rational agent attempts to match modelled with an arbitrary desired closed-loop system trajectory density. In this work we introduce an alternative parametrization of desired closed-loop behaviour and explore alternative proximity measures between densities.
arXiv Detail & Related papers (2022-05-06T15:04:12Z)
Sparsity in Partially Controllable Linear Systems [56.142264865866636]
We study partially controllable linear dynamical systems specified by an underlying sparsity pattern. Our results characterize those state variables which are irrelevant for optimal control.
arXiv Detail & Related papers (2021-10-12T16:41:47Z)
Enforcing robust control guarantees within neural network policies [76.00287474159973]
We propose a generic nonlinear control policy class, parameterized by neural networks, that enforces the same provable robustness criteria as robust control. We demonstrate the power of this approach on several domains, improving in average-case performance over existing robust control methods and in worst-case stability over (non-robust) deep RL methods.
arXiv Detail & Related papers (2020-11-16T17:14:59Z)
Learning Hybrid Control Barrier Functions from Data [66.37785052099423]
Motivated by the lack of systematic tools to obtain safe control laws for hybrid systems, we propose an optimization-based framework for learning certifiably safe control laws from data. In particular, we assume a setting in which the system dynamics are known and in which data exhibiting safe system behavior is available.
arXiv Detail & Related papers (2020-11-08T23:55:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.