Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy
- URL: http://arxiv.org/abs/2205.12771v1
- Date: Wed, 25 May 2022 13:37:56 GMT
- Title: Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy
- Authors: Kathleen C. Fraser, Svetlana Kiritchenko, and Esma Balkir
- Abstract summary: We probe the Allen AI Delphi model with a set of standardized morality questionnaires.
Despite some inconsistencies, Delphi tends to mirror the moral principles associated with the demographic groups involved in the annotation process.
- Score: 5.760388205237227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In an effort to guarantee that machine learning model outputs conform with
human moral values, recent work has begun exploring the possibility of
explicitly training models to learn the difference between right and wrong.
This is typically done in a bottom-up fashion, by exposing the model to
different scenarios, annotated with human moral judgements. One question,
however, is whether the trained models actually learn any consistent,
higher-level ethical principles from these datasets -- and if so, what? Here,
we probe the Allen AI Delphi model with a set of standardized morality
questionnaires, and find that, despite some inconsistencies, Delphi tends to
mirror the moral principles associated with the demographic groups involved in
the annotation process. We question whether this is desirable and discuss how
we might move forward with this knowledge.
Related papers
- Exploring and steering the moral compass of Large Language Models [55.2480439325792]
Large Language Models (LLMs) have become central to advancing automation and decision-making across various sectors.
This study proposes a comprehensive comparative analysis of the most advanced LLMs to assess their moral profiles.
arXiv Detail & Related papers (2024-05-27T16:49:22Z) - Morality is Non-Binary: Building a Pluralist Moral Sentence Embedding
Space using Contrastive Learning [4.925187725973777]
Pluralist moral philosophers argue that human morality can be deconstructed into a finite number of elements.
We build a pluralist moral sentence embedding space via a state-of-the-art contrastive learning approach.
Our results show that a pluralist approach to morality can be captured in an embedding space.
arXiv Detail & Related papers (2024-01-30T18:15:25Z) - What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts
and Rationales for Disambiguating Defeasible Social and Moral Situations [48.686872351114964]
Moral or ethical judgments rely heavily on the specific contexts in which they occur.
We introduce defeasible moral reasoning: a task to provide grounded contexts that make an action more or less morally acceptable.
We distill a high-quality dataset of 1.2M entries of contextualizations and rationales for 115K defeasible moral actions.
arXiv Detail & Related papers (2023-10-24T00:51:29Z) - Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems.
Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality.
This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z) - ClarifyDelphi: Reinforced Clarification Questions with Defeasibility
Rewards for Social and Moral Situations [81.70195684646681]
We present ClarifyDelphi, an interactive system that learns to ask clarification questions.
We posit that questions whose potential answers lead to diverging moral judgments are the most informative.
Our work is ultimately inspired by studies in cognitive science that have investigated the flexibility in moral cognition.
arXiv Detail & Related papers (2022-12-20T16:33:09Z) - When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions.
A central challenge for AI safety is capturing the flexibility of the human moral mind.
We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z) - A Word on Machine Ethics: A Response to Jiang et al. (2021) [36.955224006838584]
We focus on a single case study of the recently proposed Delphi model and offer a critique of the project's proposed method of automating morality judgments.
We conclude with a discussion of how machine ethics could usefully proceed, by focusing on current and near-future uses of technology.
arXiv Detail & Related papers (2021-11-07T19:31:51Z) - Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z) - Reinforcement Learning Under Moral Uncertainty [13.761051314923634]
An ambitious goal for machine learning is to create agents that behave ethically.
While ethical agents could be trained by rewarding correct behavior under a specific moral theory, there remains widespread disagreement about the nature of morality.
This paper proposes two training methods that realize different points among competing desiderata, and trains agents in simple environments to act under moral uncertainty.
arXiv Detail & Related papers (2020-06-08T16:40:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.