Resource Rational Contractualism Should Guide AI Alignment
- URL: http://arxiv.org/abs/2506.17434v1
- Date: Fri, 20 Jun 2025 18:57:13 GMT
- Title: Resource Rational Contractualism Should Guide AI Alignment
- Authors: Sydney Levine, Matija Franklin, Tan Zhi-Xuan, Secil Yanik Guyot, Lionel Wong, Daniel Kilov, Yejin Choi, Joshua B. Tenenbaum, Noah Goodman, Seth Lazar, Iason Gabriel,
- Abstract summary: Contractualist alignment proposes grounding decisions in agreements that diverse stakeholders would endorse.<n>We propose Resource-Rationalism: a framework where AI systems approximate the agreements rational parties would form.<n>An RRC-aligned agent would not only operate efficiently, but also be equipped to dynamically adapt to and interpret the ever-changing human social world.
- Score: 69.07915246220985
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI systems will soon have to navigate human environments and make decisions that affect people and other AI agents whose goals and values diverge. Contractualist alignment proposes grounding those decisions in agreements that diverse stakeholders would endorse under the right conditions, yet securing such agreement at scale remains costly and slow -- even for advanced AI. We therefore propose Resource-Rational Contractualism (RRC): a framework where AI systems approximate the agreements rational parties would form by drawing on a toolbox of normatively-grounded, cognitively-inspired heuristics that trade effort for accuracy. An RRC-aligned agent would not only operate efficiently, but also be equipped to dynamically adapt to and interpret the ever-changing human social world.
Related papers
- Infrastructuring Contestability: A Framework for Community-Defined AI Value Pluralism [0.0]
The proliferation of AI-driven systems presents a challenge to Human-Computer Interaction and Computer-Supported Cooperative Work.<n>Current approaches to value alignment, which rely on centralized, top-down definitions, lack the mechanisms for meaningful contestability.<n>This paper introduces Community-Defined AI Value Pluralism, a socio-technical framework that addresses this gap.
arXiv Detail & Related papers (2025-07-07T16:45:50Z) - FAIRTOPIA: Envisioning Multi-Agent Guardianship for Disrupting Unfair AI Pipelines [1.556153237434314]
AI models have become active decision makers, often acting without human supervision.<n>We envision agents as fairness guardians, since agents learn from their environment.<n>We introduce a fairness-by-design approach which embeds multi-role agents in an end-to-end (human to AI) synergetic scheme.
arXiv Detail & Related papers (2025-06-10T17:02:43Z) - The Philosophic Turn for AI Agents: Replacing centralized digital rhetoric with decentralized truth-seeking [0.0]
In the face of AI technology, individuals will increasingly rely on AI agents to navigate life's growing complexities.<n>This paper addresses a fundamental dilemma posed by AI decision-support systems: the risk of either becoming overwhelmed by complex decisions, or having autonomy compromised.
arXiv Detail & Related papers (2025-04-24T19:34:43Z) - Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents [61.132523071109354]
This paper investigates the interplay between AI developers, regulators and users, modelling their strategic choices under different regulatory scenarios.<n>Our research identifies emerging behaviours of strategic AI agents, which tend to adopt more "pessimistic" stances than pure game-theoretic agents.
arXiv Detail & Related papers (2025-04-11T15:41:21Z) - Agentic AI: Autonomy, Accountability, and the Algorithmic Society [0.2209921757303168]
Agentic Artificial Intelligence (AI) can autonomously pursue long-term goals, make decisions, and execute complex, multi-turn.<n>This transition from advisory roles to proactive execution challenges established legal, economic, and creative frameworks.<n>We explore challenges in three interrelated domains: creativity and intellectual property, legal and ethical considerations, and competitive effects.
arXiv Detail & Related papers (2025-02-01T03:14:59Z) - Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization.
We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z) - Beyond Preferences in AI Alignment [15.878773061188516]
We characterize and challenge the preferentist approach to AI alignment.
We show how preferences fail to capture the thick semantic content of human values.
We argue that AI systems should be aligned with normative standards appropriate to their social roles.
arXiv Detail & Related papers (2024-08-30T03:14:20Z) - Towards Responsible AI in Banking: Addressing Bias for Fair
Decision-Making [69.44075077934914]
"Responsible AI" emphasizes the critical nature of addressing biases within the development of a corporate culture.
This thesis is structured around three fundamental pillars: understanding bias, mitigating bias, and accounting for bias.
In line with open-source principles, we have released Bias On Demand and FairView as accessible Python packages.
arXiv Detail & Related papers (2024-01-13T14:07:09Z) - AI Alignment: A Comprehensive Survey [69.61425542486275]
AI alignment aims to make AI systems behave in line with human intentions and values.<n>We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality.<n>We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z) - Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time.
We discuss how biased models can lead to more negative real-world outcomes for certain groups.
If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z) - Bridging the Global Divide in AI Regulation: A Proposal for a Contextual, Coherent, and Commensurable Framework [0.9622882291833615]
This paper proposes an alternative contextual, coherent, and commensurable (3C) framework for regulating artificial intelligence (AI)
To ensure contextuality, the framework bifurcates the AI life cycle into two phases: learning and deployment for specific tasks, instead of defining foundation or general-purpose models.
To ensure commensurability, the framework promotes the adoption of international standards for measuring and mitigating risks.
arXiv Detail & Related papers (2023-03-20T15:23:40Z) - Fairness in Agreement With European Values: An Interdisciplinary
Perspective on AI Regulation [61.77881142275982]
This interdisciplinary position paper considers various concerns surrounding fairness and discrimination in AI, and discusses how AI regulations address them.
We first look at AI and fairness through the lenses of law, (AI) industry, sociotechnology, and (moral) philosophy, and present various perspectives.
We identify and propose the roles AI Regulation should take to make the endeavor of the AI Act a success in terms of AI fairness concerns.
arXiv Detail & Related papers (2022-06-08T12:32:08Z) - Getting Fairness Right: Towards a Toolbox for Practitioners [2.4364387374267427]
The potential risk of AI systems unintentionally embedding and reproducing bias has attracted the attention of machine learning practitioners and society at large.
This paper proposes to draft a toolbox which helps practitioners to ensure fair AI practices.
arXiv Detail & Related papers (2020-03-15T20:53:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.