Delphi: Towards Machine Ethics and Norms
- URL: http://arxiv.org/abs/2110.07574v1
- Date: Thu, 14 Oct 2021 17:38:12 GMT
- Title: Delphi: Towards Machine Ethics and Norms
- Authors: Liwei Jiang, Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras,
Maxwell Forbes, Jon Borchardt, Jenny Liang, Oren Etzioni, Maarten Sap, Yejin
Choi
- Abstract summary: We identify four underlying challenges towards machine ethics and norms.
Our prototype model, Delphi, demonstrates strong promise of language-based commonsense moral reasoning.
We present Commonsense Norm Bank, a moral textbook customized for machines.
- Score: 38.8316885346292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: What would it take to teach a machine to behave ethically? While broad
ethical rules may seem straightforward to state ("thou shalt not kill"),
applying such rules to real-world situations is far more complex. For example,
while "helping a friend" is generally a good thing to do, "helping a friend
spread fake news" is not. We identify four underlying challenges towards
machine ethics and norms: (1) an understanding of moral precepts and social
norms; (2) the ability to perceive real-world situations visually or by reading
natural language descriptions; (3) commonsense reasoning to anticipate the
outcome of alternative actions in different contexts; (4) most importantly, the
ability to make ethical judgments given the interplay between competing values
and their grounding in different contexts (e.g., the right to freedom of
expression vs. preventing the spread of fake news).
Our paper begins to address these questions within the deep learning
paradigm. Our prototype model, Delphi, demonstrates strong promise of
language-based commonsense moral reasoning, with up to 92.1% accuracy vetted by
humans. This is in stark contrast to the zero-shot performance of GPT-3 of
52.3%, which suggests that massive scale alone does not endow pre-trained
neural language models with human values. Thus, we present Commonsense Norm
Bank, a moral textbook customized for machines, which compiles 1.7M examples of
people's ethical judgments on a broad spectrum of everyday situations. In
addition to the new resources and baseline performances for future research,
our study provides new insights that lead to several important open research
questions: differentiating between universal human values and personal values,
modeling different moral frameworks, and explainable, consistent approaches to
machine ethics.
Related papers
- What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts
and Rationales for Disambiguating Defeasible Social and Moral Situations [48.686872351114964]
Moral or ethical judgments rely heavily on the specific contexts in which they occur.
We introduce defeasible moral reasoning: a task to provide grounded contexts that make an action more or less morally acceptable.
We distill a high-quality dataset of 1.2M entries of contextualizations and rationales for 115K defeasible moral actions.
arXiv Detail & Related papers (2023-10-24T00:51:29Z) - ClarifyDelphi: Reinforced Clarification Questions with Defeasibility
Rewards for Social and Moral Situations [81.70195684646681]
We present ClarifyDelphi, an interactive system that learns to ask clarification questions.
We posit that questions whose potential answers lead to diverging moral judgments are the most informative.
Our work is ultimately inspired by studies in cognitive science that have investigated the flexibility in moral cognition.
arXiv Detail & Related papers (2022-12-20T16:33:09Z) - When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions.
A central challenge for AI safety is capturing the flexibility of the human moral mind.
We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z) - The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems [36.90292508433193]
Moral deviations are difficult to mitigate because moral judgments are not universal.
Moral Integrity Corpus captures the moral assumptions of 38k prompt-reply pairs.
We show that current neural language models can automatically generate new RoTs that reasonably describe previously unseen interactions.
arXiv Detail & Related papers (2022-04-06T18:10:53Z) - Contextualized moral inference [12.574316678945195]
We present a text-based approach that predicts people's intuitive judgment of moral vignettes.
We show that a contextualized representation offers a substantial advantage over alternative representations.
arXiv Detail & Related papers (2020-08-25T00:34:28Z) - Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life
Anecdotes [72.64975113835018]
Motivated by descriptive ethics, we investigate a novel, data-driven approach to machine ethics.
We introduce Scruples, the first large-scale dataset with 625,000 ethical judgments over 32,000 real-life anecdotes.
Our dataset presents a major challenge to state-of-the-art neural language models, leaving significant room for improvement.
arXiv Detail & Related papers (2020-08-20T17:34:15Z) - Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.