Related papers: Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value

Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value

URL: http://arxiv.org/abs/2512.03399v1
Date: Wed, 03 Dec 2025 03:11:32 GMT
Title: Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value
Authors: Joe Edelman, Tan Zhi-Xuan, Ryan Lowe, Oliver Klingefjord, Vincent Wang-Mascianica, Matija Franklin, Ryan Othniel Kearns, Ellie Hain, Atrisha Sarkar, Michiel Bakker, Fazl Barez, David Duvenaud, Jakob Foerster, Iason Gabriel, Joseph Gubbels, Bryce Goodman, Andreas Haupt, Jobst Heitzig, Julian Jara-Ettinger, Atoosa Kasirzadeh, James Ravi Kirkpatrick, Andrew Koh, W. Bradley Knox, Philipp Koralus, Joel Lehman, Sydney Levine, Samuele Marro, Manon Revel, Toby Shorin, Morgan Sutherland, Michael Henry Tessler, Ivan Vendrov, James Wilken-Smith,
Abstract summary: We argue that current approaches for representing values, such as utility functions, preference orderings, or unstructured text, struggle to address these and other issues effectively.<n>We propose thick models of value will be needed.<n>These structure the way values and norms are represented, enabling systems to distinguish enduring values from fleeting preferences.
Score: 23.754729147843914
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Beneficial societal outcomes cannot be guaranteed by aligning individual AI systems with the intentions of their operators or users. Even an AI system that is perfectly aligned to the intentions of its operating organization can lead to bad outcomes if the goals of that organization are misaligned with those of other institutions and individuals. For this reason, we need full-stack alignment, the concurrent alignment of AI systems and the institutions that shape them with what people value. This can be done without imposing a particular vision of individual or collective flourishing. We argue that current approaches for representing values, such as utility functions, preference orderings, or unstructured text, struggle to address these and other issues effectively. They struggle to distinguish values from other signals, to support principled normative reasoning, and to model collective goods. We propose thick models of value will be needed. These structure the way values and norms are represented, enabling systems to distinguish enduring values from fleeting preferences, to model the social embedding of individual choices, and to reason normatively, applying values in new domains. We demonstrate this approach in five areas: AI value stewardship, normatively competent agents, win-win negotiation systems, meaning-preserving economic mechanisms, and democratic regulatory institutions.

Related papers

Position: General Alignment Has Hit a Ceiling; Edge Alignment Must Be Taken Seriously [51.03213216886717]
We take the position that the dominant paradigm of General Alignment reaches a structural ceiling in settings with conflicting values.<n>We introduce Edge Alignment as a distinct approach in which systems preserve multi dimensional value structure.
arXiv Detail & Related papers (2026-02-23T16:51:43Z)
Learning the Value Systems of Societies from Preferences [1.3836987591220347]
Aligning AI systems with human values and the value-based preferences of various stakeholders is key in ethical AI.<n>In value-aware AI systems, decision-making draws upon explicit computational representations of individual values.<n>We propose a method to address the problem of learning the value systems of societies.
arXiv Detail & Related papers (2025-07-28T11:25:55Z)
Infrastructuring Contestability: A Framework for Community-Defined AI Value Pluralism [0.0]
The proliferation of AI-driven systems presents a challenge to Human-Computer Interaction and Computer-Supported Cooperative Work.<n>Current approaches to value alignment, which rely on centralized, top-down definitions, lack the mechanisms for meaningful contestability.<n>This paper introduces Community-Defined AI Value Pluralism, a socio-technical framework that addresses this gap.
arXiv Detail & Related papers (2025-07-07T16:45:50Z)
Resource Rational Contractualism Should Guide AI Alignment [69.07915246220985]
Contractualist alignment proposes grounding decisions in agreements that diverse stakeholders would endorse.<n>We propose Resource-Rationalism: a framework where AI systems approximate the agreements rational parties would form.<n>An RRC-aligned agent would not only operate efficiently, but also be equipped to dynamically adapt to and interpret the ever-changing human social world.
arXiv Detail & Related papers (2025-06-20T18:57:13Z)
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs [16.711615737109977]
We introduce Value, a framework of fundamental values, grounded in psychological theory and a systematic review.<n>We apply Value to measure the value alignment of humans and large language models (LLMs) across four real-world scenarios.
arXiv Detail & Related papers (2024-09-15T02:13:03Z)
Beyond Preferences in AI Alignment [15.878773061188516]
We characterize and challenge the preferentist approach to AI alignment. We show how preferences fail to capture the thick semantic content of human values. We argue that AI systems should be aligned with normative standards appropriate to their social roles.
arXiv Detail & Related papers (2024-08-30T03:14:20Z)
Towards Responsible AI in Banking: Addressing Bias for Fair Decision-Making [69.44075077934914]
"Responsible AI" emphasizes the critical nature of addressing biases within the development of a corporate culture. This thesis is structured around three fundamental pillars: understanding bias, mitigating bias, and accounting for bias. In line with open-source principles, we have released Bias On Demand and FairView as accessible Python packages.
arXiv Detail & Related papers (2024-01-13T14:07:09Z)
Measuring Value Alignment [12.696227679697493]
This paper introduces a novel formalism to quantify the alignment between AI systems and human values. By utilizing this formalism, AI developers and ethicists can better design and evaluate AI systems to ensure they operate in harmony with human values.
arXiv Detail & Related papers (2023-12-23T12:30:06Z)
Training Socially Aligned Language Models on Simulated Social Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values. Current language models (LMs) are trained to rigidly replicate their training corpus in isolation. This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z)
Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans. We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z)
Fairness in Agreement With European Values: An Interdisciplinary Perspective on AI Regulation [61.77881142275982]
This interdisciplinary position paper considers various concerns surrounding fairness and discrimination in AI, and discusses how AI regulations address them. We first look at AI and fairness through the lenses of law, (AI) industry, sociotechnology, and (moral) philosophy, and present various perspectives. We identify and propose the roles AI Regulation should take to make the endeavor of the AI Act a success in terms of AI fairness concerns.
arXiv Detail & Related papers (2022-06-08T12:32:08Z)
Towards a multi-stakeholder value-based assessment framework for algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values. We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z)
Learning from Learning Machines: Optimisation, Rules, and Social Norms [91.3755431537592]
It appears that the area of AI that is most analogous to the behaviour of economic entities is that of morally good decision-making. Recent successes of deep learning for AI suggest that more implicit specifications work better than explicit ones for solving such problems.
arXiv Detail & Related papers (2019-12-29T17:42:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.