AI Alignment at Your Discretion
- URL: http://arxiv.org/abs/2502.10441v1
- Date: Mon, 10 Feb 2025 09:19:52 GMT
- Title: AI Alignment at Your Discretion
- Authors: Maarten Buyl, Hadi Khalaf, Claudio Mayrink Verdun, Lucas Monteiro Paes, Caio C. Vieira Machado, Flavio du Pin Calmon,
- Abstract summary: In AI alignment, latitude must be granted to annotators, either human or algorithmic, to judge which model outputs are better' or safer'<n>Such discretion remains largely unexamined, posing two risks: (i) annotators may use their power of discretion arbitrarily, and (ii) models may fail to mimic this discretion.<n>By measuring both human and algorithmic discretion over safety alignment datasets, we reveal layers of discretion in the alignment process that were previously unaccounted for.
- Score: 7.133218044328296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In AI alignment, extensive latitude must be granted to annotators, either human or algorithmic, to judge which model outputs are `better' or `safer.' We refer to this latitude as alignment discretion. Such discretion remains largely unexamined, posing two risks: (i) annotators may use their power of discretion arbitrarily, and (ii) models may fail to mimic this discretion. To study this phenomenon, we draw on legal concepts of discretion that structure how decision-making authority is conferred and exercised, particularly in cases where principles conflict or their application is unclear or irrelevant. Extended to AI alignment, discretion is required when alignment principles and rules are (inevitably) conflicting or indecisive. We present a set of metrics to systematically analyze when and how discretion in AI alignment is exercised, such that both risks (i) and (ii) can be observed. Moreover, we distinguish between human and algorithmic discretion and analyze the discrepancy between them. By measuring both human and algorithmic discretion over safety alignment datasets, we reveal layers of discretion in the alignment process that were previously unaccounted for. Furthermore, we demonstrate how algorithms trained on these datasets develop their own forms of discretion in interpreting and applying these principles, which challenges the purpose of having any principles at all. Our paper presents the first step towards formalizing this core gap in current alignment processes, and we call on the community to further scrutinize and control alignment discretion.
Related papers
- Beyond Preferences in AI Alignment [15.878773061188516]
We characterize and challenge the preferentist approach to AI alignment.
We show how preferences fail to capture the thick semantic content of human values.
We argue that AI systems should be aligned with normative standards appropriate to their social roles.
arXiv Detail & Related papers (2024-08-30T03:14:20Z) - Rethinking State Disentanglement in Causal Reinforcement Learning [78.12976579620165]
Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability.
We revisit this research line and find that incorporating RL-specific context can reduce unnecessary assumptions in previous identifiability analyses for latent states.
We propose a novel approach for general partially observable Markov Decision Processes (POMDPs) by replacing the complicated structural constraints in previous methods with two simple constraints for transition and reward preservation.
arXiv Detail & Related papers (2024-08-24T06:49:13Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - AI Alignment: A Comprehensive Survey [70.35693485015659]
AI alignment aims to make AI systems behave in line with human intentions and values.
We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality.
We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - Rethinking Algorithmic Fairness for Human-AI Collaboration [29.334511328067777]
Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithms.<n>We show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy.
arXiv Detail & Related papers (2023-10-05T16:21:42Z) - Fairness in Matching under Uncertainty [78.39459690570531]
algorithmic two-sided marketplaces have drawn attention to the issue of fairness in such settings.
We axiomatize a notion of individual fairness in the two-sided marketplace setting which respects the uncertainty in the merits.
We design a linear programming framework to find fair utility-maximizing distributions over allocations.
arXiv Detail & Related papers (2023-02-08T00:30:32Z) - Beyond Incompatibility: Trade-offs between Mutually Exclusive Fairness Criteria in Machine Learning and Law [2.959308758321417]
We present a novel algorithm (FAir Interpolation Method: FAIM) for continuously interpolating between three fairness criteria.
We demonstrate the effectiveness of our algorithm when applied to synthetic data, the COMPAS data set, and a new, real-world data set from the e-commerce sector.
arXiv Detail & Related papers (2022-12-01T12:47:54Z) - Algorithmic Assistance with Recommendation-Dependent Preferences [2.864550757598007]
We consider the effect and design of algorithmic recommendations when they affect choices.
We show that recommendation-dependent preferences create inefficiencies where the decision-maker is overly responsive to the recommendation.
arXiv Detail & Related papers (2022-08-16T09:24:47Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - Probabilistic Control and Majorization of Optimal Control [3.2634122554914002]
Probabilistic control design is founded on the principle that a rational agent attempts to match modelled with an arbitrary desired closed-loop system trajectory density.
In this work we introduce an alternative parametrization of desired closed-loop behaviour and explore alternative proximity measures between densities.
arXiv Detail & Related papers (2022-05-06T15:04:12Z) - Randomized Classifiers vs Human Decision-Makers: Trustworthy AI May Have
to Act Randomly and Society Seems to Accept This [0.8889304968879161]
We feel that akin to human decisions, judgments of artificial agents should necessarily be grounded in some moral principles.
Yet a decision-maker can only make truly ethical (based on any ethical theory) and fair (according to any notion of fairness) decisions if full information on all the relevant factors on which the decision is based are available at the time of decision-making.
arXiv Detail & Related papers (2021-11-15T05:39:02Z) - Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment [74.0482641714311]
We introduce two coordinated reasoning methods, i.e., the Easy-to-Hard decoding strategy and joint entity alignment algorithm.
Our model achieves the state-of-the-art performance and our reasoning methods can also significantly improve existing baselines.
arXiv Detail & Related papers (2020-01-23T18:41:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.