Beyond Preferences in AI Alignment
- URL: http://arxiv.org/abs/2408.16984v2
- Date: Wed, 6 Nov 2024 20:34:14 GMT
- Title: Beyond Preferences in AI Alignment
- Authors: Tan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton,
- Abstract summary: We characterize and challenge the preferentist approach to AI alignment.
We show how preferences fail to capture the thick semantic content of human values.
We argue that AI systems should be aligned with normative standards appropriate to their social roles.
- Score: 15.878773061188516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The dominant practice of AI alignment assumes (1) that preferences are an adequate representation of human values, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferences, and (3) that AI systems should be aligned with the preferences of one or more humans to ensure that they behave safely and in accordance with our values. Whether implicitly followed or explicitly endorsed, these commitments constitute what we term a preferentist approach to AI alignment. In this paper, we characterize and challenge the preferentist approach, describing conceptual and technical alternatives that are ripe for further research. We first survey the limits of rational choice theory as a descriptive model, explaining how preferences fail to capture the thick semantic content of human values, and how utility representations neglect the possible incommensurability of those values. We then critique the normativity of expected utility theory (EUT) for humans and AI, drawing upon arguments showing how rational agents need not comply with EUT, while highlighting how EUT is silent on which preferences are normatively acceptable. Finally, we argue that these limitations motivate a reframing of the targets of AI alignment: Instead of alignment with the preferences of a human user, developer, or humanity-writ-large, AI systems should be aligned with normative standards appropriate to their social roles, such as the role of a general-purpose assistant. Furthermore, these standards should be negotiated and agreed upon by all relevant stakeholders. On this alternative conception of alignment, a multiplicity of AI systems will be able to serve diverse ends, aligned with normative standards that promote mutual benefit and limit harm despite our plural and divergent values.
Related papers
- Aligning Large Language Models from Self-Reference AI Feedback with one General Principle [61.105703857868775]
We propose a self-reference-based AI feedback framework that enables a 13B Llama2-Chat to provide high-quality feedback.
Specifically, we allow the AI to first respond to the user's instructions, then generate criticism of other answers based on its own response as a reference.
Finally, we determine which answer better fits human preferences according to the criticism.
arXiv Detail & Related papers (2024-06-17T03:51:46Z) - Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment [0.0]
We find "alignment" a problem related to the challenges of expressing human goals and values in a manner that artificial systems can follow without leading to unwanted adversarial effects.
This work addresses alignment as a technical-philosophical problem that requires solid philosophical foundations and practical implementations that bring normative theory to AI system development.
arXiv Detail & Related papers (2024-06-16T18:37:31Z) - Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment.
The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment.
We introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML)
arXiv Detail & Related papers (2024-06-13T16:03:25Z) - Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - Towards Responsible AI in Banking: Addressing Bias for Fair
Decision-Making [69.44075077934914]
"Responsible AI" emphasizes the critical nature of addressing biases within the development of a corporate culture.
This thesis is structured around three fundamental pillars: understanding bias, mitigating bias, and accounting for bias.
In line with open-source principles, we have released Bias On Demand and FairView as accessible Python packages.
arXiv Detail & Related papers (2024-01-13T14:07:09Z) - Measuring Value Alignment [12.696227679697493]
This paper introduces a novel formalism to quantify the alignment between AI systems and human values.
By utilizing this formalism, AI developers and ethicists can better design and evaluate AI systems to ensure they operate in harmony with human values.
arXiv Detail & Related papers (2023-12-23T12:30:06Z) - Tensions Between the Proxies of Human Values in AI [20.303537771118048]
We argue that the AI community needs to consider all the consequences of choosing certain formulations of these pillars.
We point towards sociotechnical research for frameworks for the latter, but push for broader efforts into implementing these in practice.
arXiv Detail & Related papers (2022-12-14T21:13:48Z) - Fairness in Agreement With European Values: An Interdisciplinary
Perspective on AI Regulation [61.77881142275982]
This interdisciplinary position paper considers various concerns surrounding fairness and discrimination in AI, and discusses how AI regulations address them.
We first look at AI and fairness through the lenses of law, (AI) industry, sociotechnology, and (moral) philosophy, and present various perspectives.
We identify and propose the roles AI Regulation should take to make the endeavor of the AI Act a success in terms of AI fairness concerns.
arXiv Detail & Related papers (2022-06-08T12:32:08Z) - Joint Optimization of AI Fairness and Utility: A Human-Centered Approach [45.04980664450894]
We argue that because different fairness criteria sometimes cannot be simultaneously satisfied, it is key to acquire and adhere to human policy makers' preferences on how to make the tradeoff among these objectives.
We propose a framework and some exemplar methods for eliciting such preferences and for optimizing an AI model according to these preferences.
arXiv Detail & Related papers (2020-02-05T03:31:48Z) - Artificial Intelligence, Values and Alignment [2.28438857884398]
normative and technical aspects of the AI alignment problem are interrelated.
It is important to be clear about the goal of alignment.
The central challenge for theorists is not to identify 'true' moral principles for AI.
arXiv Detail & Related papers (2020-01-13T10:32:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.