Related papers: Mitigating loss of control in advanced AI systems through instrumental goal trajectories

Mitigating loss of control in advanced AI systems through instrumental goal trajectories

URL: http://arxiv.org/abs/2602.01699v1
Date: Mon, 02 Feb 2026 06:13:21 GMT
Title: Mitigating loss of control in advanced AI systems through instrumental goal trajectories
Authors: Willem Fourie,
Abstract summary: We develop instrumental goal trajectories to expand options beyond the model.<n>We label these pathways the procurement, governance and finance instrumental goal trajectories (IGTs)<n>IGTs offer concrete avenues for defining capability levels and for broadening how corrigibility and interruptibility are implemented.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Researchers at artificial intelligence labs and universities are concerned that highly capable artificial intelligence (AI) systems may erode human control by pursuing instrumental goals. Existing mitigations remain largely technical and system-centric: tracking capability in advanced systems, shaping behaviour through methods such as reinforcement learning from human feedback, and designing systems to be corrigible and interruptible. Here we develop instrumental goal trajectories to expand these options beyond the model. Gaining capability typically depends on access to additional technical resources, such as compute, storage, data and adjacent services, which in turn requires access to monetary resources. In organisations, these resources can be obtained through three organisational pathways. We label these pathways the procurement, governance and finance instrumental goal trajectories (IGTs). Each IGT produces a trail of organisational artefacts that can be monitored and used as intervention points when a systems capabilities or behaviour exceed acceptable thresholds. In this way, IGTs offer concrete avenues for defining capability levels and for broadening how corrigibility and interruptibility are implemented, shifting attention from model properties alone to the organisational systems that enable them.

Related papers

Steering LLMs via Scalable Interactive Oversight [74.12746881843044]
Large Language Models increasingly automate complex, long-horizon tasks such as emphvibe coding, a supervision gap has emerged.<n>It presents a critical challenge in scalable oversight: enabling humans to responsibly steer AI systems on tasks that surpass their own ability to specify or verify.
arXiv Detail & Related papers (2026-02-04T04:52:00Z)
Institutional AI: A Governance Framework for Distributional AGI Safety [1.3763052684269788]
We identify three structural problems that emerge from core properties of AI models.<n>The solution is Institutional AI, a system-level approach that treats alignment as a question of effective governance of AI agent collectives.
arXiv Detail & Related papers (2026-01-15T17:08:26Z)
Corrigibility as a Singular Target: A Vision for Inherently Reliable Foundation Models [0.0]
Foundation models (FMs) face a critical safety challenge: as capabilities scale, instrumental convergence drives default trajectories toward loss of human control.<n>We propose "Corrigibility as a Singular Target" (CAST)-designing FMs whose overriding objective is empowering designated human principals to guide, correct, and control them.
arXiv Detail & Related papers (2025-06-03T16:36:03Z)
Deep Reinforcement Learning Based Systems for Safety Critical Applications in Aerospace [0.0]
Recent advancements in artificial intelligence (AI) applications within aerospace have demonstrated substantial growth.<n>As High Performance Computing platforms continue to evolve, they are expected to replace current flight control or engine control computers.<n>This shift will allow real-time AI applications, such as image processing and defect detection, to be seamlessly integrated into monitoring systems.
arXiv Detail & Related papers (2024-12-21T05:17:55Z)
A Blueprint for Auditing Generative AI [0.9999629695552196]
generative AI systems display emergent capabilities and are adaptable to a wide range of downstream tasks. Existing auditing procedures fail to address the governance challenges posed by generative AI systems. We propose a three-layered approach, whereby governance audits of technology providers that design and disseminate generative AI systems, model audits of generative AI systems after pre-training but prior to their release, and application audits of applications based on top of generative AI systems.
arXiv Detail & Related papers (2024-07-07T11:56:54Z)
Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data [96.5899286619008]
Self-supervised learning has the potential to decrease the amount of human annotation and engineering effort required to learn control strategies.<n>Our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem.<n>We demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks.
arXiv Detail & Related papers (2023-06-06T01:36:56Z)
Artificial Intelligence in Governance, Risk and Compliance: Results of a study on potentials for the application of artificial intelligence (AI) in governance, risk and compliance (GRC) [0.0]
GRC (Governance, Risk and Compliance) means an integrated governance-approach. Governance functions are interlinked and not separated from each other. Artificial intelligence is being used in GRC for processing and analysis of unstructured data sets.
arXiv Detail & Related papers (2022-12-07T12:36:10Z)
Towards AIOps in Edge Computing Environments [60.27785717687999]
This paper describes the system design of an AIOps platform which is applicable in heterogeneous, distributed environments. It is feasible to collect metrics with a high frequency and simultaneously run specific anomaly detection algorithms directly on edge devices.
arXiv Detail & Related papers (2021-02-12T09:33:00Z)
Decentralized Control with Graph Neural Networks [147.84766857793247]
We propose a novel framework using graph neural networks (GNNs) to learn decentralized controllers. GNNs are well-suited for the task since they are naturally distributed architectures and exhibit good scalability and transferability properties. The problems of flocking and multi-agent path planning are explored to illustrate the potential of GNNs in learning decentralized controllers.
arXiv Detail & Related papers (2020-12-29T18:59:14Z)
Learning to Track Dynamic Targets in Partially Known Environments [48.49957897251128]
We use a deep reinforcement learning approach to solve active target tracking. In particular, we introduce Active Tracking Target Network (ATTN), a unified RL policy that is capable of solving major sub-tasks of active target tracking.
arXiv Detail & Related papers (2020-06-17T22:45:24Z)
Distributed and Democratized Learning: Philosophy and Research Challenges [80.39805582015133]
We propose a novel design philosophy called democratized learning (Dem-AI) Inspired by the societal groups of humans, the specialized groups of learning agents in the proposed Dem-AI system are self-organized in a hierarchical structure to collectively perform learning tasks more efficiently. We present a reference design as a guideline to realize future Dem-AI systems, inspired by various interdisciplinary fields.
arXiv Detail & Related papers (2020-03-18T08:45:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.