Related papers: Misalignment or misuse? The AGI alignment tradeoff

Misalignment or misuse? The AGI alignment tradeoff

URL: http://arxiv.org/abs/2506.03755v1
Date: Wed, 04 Jun 2025 09:22:37 GMT
Title: Misalignment or misuse? The AGI alignment tradeoff
Authors: Max Hellrigel-Holderbaum, Leonard Dung,
Abstract summary: We defend the view that misaligned AGI - future, generally intelligent (robotic) AI agents - poses catastrophic risks.<n>We show that there is room for alignment approaches which do not increase misuse risk.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Creating systems that are aligned with our goals is seen as a leading approach to create safe and beneficial AI in both leading AI companies and the academic field of AI safety. We defend the view that misaligned AGI - future, generally intelligent (robotic) AI agents - poses catastrophic risks. At the same time, we support the view that aligned AGI creates a substantial risk of catastrophic misuse by humans. While both risks are severe and stand in tension with one another, we show that - in principle - there is room for alignment approaches which do not increase misuse risk. We then investigate how the tradeoff between misalignment and misuse looks empirically for different technical approaches to AI alignment. Here, we argue that many current alignment techniques and foreseeable improvements thereof plausibly increase risks of catastrophic misuse. Since the impacts of AI depend on the social context, we close by discussing important social factors and suggest that to reduce the risk of a misuse catastrophe due to aligned AGI, techniques such as robustness, AI control methods and especially good governance seem essential.

Related papers

Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem [1.3905735045377272]
The AI alignment problem, which focusses on ensuring that artificial intelligence (AI) systems act according to human values, presents profound challenges.<n>With the progression from narrow AI to Artificial General Intelligence (AGI) and Superintelligence, fears about control and existential risk have escalated.<n>Here, we investigate whether embracing inevitable AI misalignment can be a contingent strategy to foster a dynamic ecosystem of competing agents.
arXiv Detail & Related papers (2025-05-05T11:33:18Z)
An Approach to Technical AGI Safety and Security [72.83728459135101]
We develop an approach to address the risk of harms consequential enough to significantly harm humanity.<n>We focus on technical approaches to misuse and misalignment.<n>We briefly outline how these ingredients could be combined to produce safety cases for AGI systems.
arXiv Detail & Related papers (2025-04-02T15:59:31Z)
Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act) Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence. As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z)
Risk Alignment in Agentic AI Systems [0.0]
Agentic AIs capable of undertaking complex actions with little supervision raise new questions about how to safely create and align such systems with users, developers, and society. Risk alignment will matter for user satisfaction and trust, but it will also have important ramifications for society more broadly. We present three papers that bear on key normative and technical aspects of these questions.
arXiv Detail & Related papers (2024-10-02T18:21:08Z)
The AI Alignment Paradox [10.674155943520729]
The better we align AI models with our values, the easier we may make it for adversaries to misalign the models. With AI's increasing real-world impact, it is imperative that a broad community of researchers be aware of the AI alignment paradox.
arXiv Detail & Related papers (2024-05-31T14:06:24Z)
Control Risk for Potential Misuse of Artificial Intelligence in Science [85.91232985405554]
We aim to raise awareness of the dangers of AI misuse in science. We highlight real-world examples of misuse in chemical science. We propose a system called SciGuard to control misuse risks for AI models in science.
arXiv Detail & Related papers (2023-12-11T18:50:57Z)
AI Alignment: A Comprehensive Survey [69.61425542486275]
AI alignment aims to make AI systems behave in line with human intentions and values.<n>We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality.<n>We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z)
Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems. There is a lack of consensus about how exactly such risks arise, and how to manage them. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z)
An Overview of Catastrophic AI Risks [38.84933208563934]
This paper provides an overview of the main sources of catastrophic AI risks, which we organize into four categories. Malicious use, in which individuals or groups intentionally use AIs to cause harm; AI race, in which competitive environments compel actors to deploy unsafe AIs or cede control to AIs. organizational risks, highlighting how human factors and complex systems can increase the chances of catastrophic accidents. rogue AIs, describing the inherent difficulty in controlling agents far more intelligent than humans.
arXiv Detail & Related papers (2023-06-21T03:35:06Z)
Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time. We discuss how biased models can lead to more negative real-world outcomes for certain groups. If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z)
On the Ethics of Building AI in a Responsible Manner [22.792375902000614]
We argue that a formalism of AI alignment that does not distinguish between strategic and misalignments is not useful. We propose a definition of a strategic-AI-alignment and prove that most machine learning algorithms that are being used in practice today do not suffer from the strategic-AI-alignment problem.
arXiv Detail & Related papers (2020-03-30T04:11:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.