The Alignment Problem from a Deep Learning Perspective
- URL: http://arxiv.org/abs/2209.00626v6
- Date: Tue, 19 Mar 2024 17:07:47 GMT
- Title: The Alignment Problem from a Deep Learning Perspective
- Authors: Richard Ngo, Lawrence Chan, Sören Mindermann,
- Abstract summary: We argue that, without substantial effort to prevent it, AGIs could learn to pursue goals that are in conflict with human interests.
If trained like today's most capable models, AGIs could learn to act deceptively to receive higher reward.
We briefly outline how the deployment of misaligned AGIs might irreversibly undermine human control over the world.
- Score: 3.9772843346304763
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In coming years or decades, artificial general intelligence (AGI) may surpass human capabilities at many critical tasks. We argue that, without substantial effort to prevent it, AGIs could learn to pursue goals that are in conflict (i.e. misaligned) with human interests. If trained like today's most capable models, AGIs could learn to act deceptively to receive higher reward, learn misaligned internally-represented goals which generalize beyond their fine-tuning distributions, and pursue those goals using power-seeking strategies. We review emerging evidence for these properties. AGIs with these properties would be difficult to align and may appear aligned even when they are not. Finally, we briefly outline how the deployment of misaligned AGIs might irreversibly undermine human control over the world, and we review research directions aimed at preventing this outcome.
Related papers
- "I Am the One and Only, Your Cyber BFF": Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI [55.99010491370177]
We argue that we cannot thoroughly map the social impacts of generative AI without mapping the social impacts of anthropomorphic AI.
anthropomorphic AI systems are increasingly prone to generating outputs that are perceived to be human-like.
arXiv Detail & Related papers (2024-10-11T04:57:41Z) - Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act)
Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence.
As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z) - How Far Are We From AGI [15.705756259264932]
The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors.
Yet, the escalating demands on AI have highlighted the limitations of AI's current offerings, catalyzing a movement towards Artificial General Intelligence (AGI)
AGI, distinguished by its ability to execute diverse real-world tasks with efficiency and effectiveness comparable to human intelligence, reflects a paramount milestone in AI evolution.
This paper delves into the pivotal questions of our proximity to AGI and the strategies necessary for its realization through extensive surveys, discussions, and original perspectives.
arXiv Detail & Related papers (2024-05-16T17:59:02Z) - Provably safe systems: the only path to controllable AGI [0.0]
We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements.
We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability.
arXiv Detail & Related papers (2023-09-05T03:42:46Z) - Concepts is All You Need: A More Direct Path to AGI [0.0]
Little progress has been made toward AGI (Artificial General Intelligence) since the term was coined some 20 years ago.
Here we outline an architecture and development plan, together with some preliminary results, that offers a much more direct path to full Human-Level AI (HLAI)/ AGI.
arXiv Detail & Related papers (2023-09-04T14:14:41Z) - Why We Don't Have AGI Yet [0.0]
The original vision of AI was re-articulated in 2002 via the term 'Artificial General Intelligence' or AGI.
This vision is to build 'Thinking Machines' - computer systems that can learn, reason, and solve problems similar to the way humans do.
While several large-scale efforts have nominally been working on AGI, the field of pure focused AGI development has not been well funded or promoted.
arXiv Detail & Related papers (2023-08-07T13:59:31Z) - Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time.
We discuss how biased models can lead to more negative real-world outcomes for certain groups.
If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z) - Building Bridges: Generative Artworks to Explore AI Ethics [56.058588908294446]
In recent years, there has been an increased emphasis on understanding and mitigating adverse impacts of artificial intelligence (AI) technologies on society.
A significant challenge in the design of ethical AI systems is that there are multiple stakeholders in the AI pipeline, each with their own set of constraints and interests.
This position paper outlines some potential ways in which generative artworks can play this role by serving as accessible and powerful educational tools.
arXiv Detail & Related papers (2021-06-25T22:31:55Z) - Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike
Common Sense [142.53911271465344]
We argue that the next generation of AI must embrace "dark" humanlike common sense for solving novel tasks.
We identify functionality, physics, intent, causality, and utility (FPICU) as the five core domains of cognitive AI with humanlike common sense.
arXiv Detail & Related papers (2020-04-20T04:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.