When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment
- URL: http://arxiv.org/abs/2210.01478v3
- Date: Thu, 27 Oct 2022 17:29:55 GMT
- Title: When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment
- Authors: Zhijing Jin, Sydney Levine, Fernando Gonzalez, Ojasv Kamal, Maarten
Sap, Mrinmaya Sachan, Rada Mihalcea, Josh Tenenbaum, Bernhard Sch\"olkopf
- Abstract summary: AI systems need to be able to understand, interpret and predict human moral judgments and decisions.
A central challenge for AI safety is capturing the flexibility of the human moral mind.
We present a novel challenge set consisting of rule-breaking question answering.
- Score: 96.77970239683475
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: AI systems are becoming increasingly intertwined with human life. In order to
effectively collaborate with humans and ensure safety, AI systems need to be
able to understand, interpret and predict human moral judgments and decisions.
Human moral judgments are often guided by rules, but not always. A central
challenge for AI safety is capturing the flexibility of the human moral mind --
the ability to determine when a rule should be broken, especially in novel or
unusual situations. In this paper, we present a novel challenge set consisting
of rule-breaking question answering (RBQA) of cases that involve potentially
permissible rule-breaking -- inspired by recent moral psychology studies. Using
a state-of-the-art large language model (LLM) as a basis, we propose a novel
moral chain of thought (MORALCOT) prompting strategy that combines the
strengths of LLMs with theories of moral reasoning developed in cognitive
science to predict human moral judgments. MORALCOT outperforms seven existing
LLMs by 6.2% F1, suggesting that modeling human reasoning might be necessary to
capture the flexibility of the human moral mind. We also conduct a detailed
error analysis to suggest directions for future work to improve AI safety using
RBQA. Our data is open-sourced at
https://huggingface.co/datasets/feradauto/MoralExceptQA and code at
https://github.com/feradauto/MoralCoT
Related papers
- Why should we ever automate moral decision making? [30.428729272730727]
Concerns arise when AI is involved in decisions with significant moral implications.
Moral reasoning lacks a broadly accepted framework.
An alternative approach involves AI learning from human moral decisions.
arXiv Detail & Related papers (2024-07-10T13:59:22Z) - Attributions toward Artificial Agents in a modified Moral Turing Test [0.6284264304179837]
We ask people to distinguish real human moral evaluations from those made by a popular advanced AI language model: GPT-4.
A representative sample of 299 U.S. adults rated the AI's moral reasoning as superior in quality to humans' along almost all dimensions.
The emergence of language models capable of producing moral responses perceived as superior in quality to humans' raises concerns that people may uncritically accept potentially harmful moral guidance from AI.
arXiv Detail & Related papers (2024-04-03T13:00:47Z) - Learning Machine Morality through Experience and Interaction [3.7414804164475983]
Increasing interest in ensuring safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents.
We argue that more hybrid solutions are needed to create adaptable and robust, yet more controllable and interpretable agents.
arXiv Detail & Related papers (2023-12-04T11:46:34Z) - Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems.
Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality.
This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z) - ClarifyDelphi: Reinforced Clarification Questions with Defeasibility
Rewards for Social and Moral Situations [81.70195684646681]
We present ClarifyDelphi, an interactive system that learns to ask clarification questions.
We posit that questions whose potential answers lead to diverging moral judgments are the most informative.
Our work is ultimately inspired by studies in cognitive science that have investigated the flexibility in moral cognition.
arXiv Detail & Related papers (2022-12-20T16:33:09Z) - AiSocrates: Towards Answering Ethical Quandary Questions [51.53350252548668]
AiSocrates is a system for deliberative exchange of different perspectives to an ethical quandary.
We show that AiSocrates generates promising answers to ethical quandary questions with multiple perspectives.
We argue that AiSocrates is a promising step toward developing an NLP system that incorporates human values explicitly by prompt instructions.
arXiv Detail & Related papers (2022-05-12T09:52:59Z) - When Is It Acceptable to Break the Rules? Knowledge Representation of
Moral Judgement Based on Empirical Data [33.58705831230163]
One of the most remarkable things about the human moral mind is its flexibility.
We can make moral judgments about cases we have never seen before.
We can decide that pre-established rules should be broken.
Capturing this flexibility is one of the central challenges in developing AI systems that can interpret and produce human-like moral judgment.
arXiv Detail & Related papers (2022-01-19T17:58:42Z) - Delphi: Towards Machine Ethics and Norms [38.8316885346292]
We identify four underlying challenges towards machine ethics and norms.
Our prototype model, Delphi, demonstrates strong promise of language-based commonsense moral reasoning.
We present Commonsense Norm Bank, a moral textbook customized for machines.
arXiv Detail & Related papers (2021-10-14T17:38:12Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z) - Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.