Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models
- URL: http://arxiv.org/abs/2404.10975v1
- Date: Wed, 17 Apr 2024 01:13:04 GMT
- Title: Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models
- Authors: Jan-Philipp Fränken, Kanishk Gandhi, Tori Qiu, Ayesha Khawaja, Noah D. Goodman, Tobias Gerstenberg,
- Abstract summary: We use a language model to translate causal graphs that capture key aspects of moral dilemmas into prompt templates.
We collect moral permissibility and intention judgments from human participants for a subset of our items.
We find that moral dilemmas in which the harm is a necessary means resulted in lower permissibility and higher intention ratings for both participants and language models.
- Score: 28.53750311045418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As AI systems like language models are increasingly integrated into decision-making processes affecting people's lives, it's critical to ensure that these systems have sound moral reasoning. To test whether they do, we need to develop systematic evaluations. We provide a framework that uses a language model to translate causal graphs that capture key aspects of moral dilemmas into prompt templates. With this framework, we procedurally generated a large and diverse set of moral dilemmas -- the OffTheRails benchmark -- consisting of 50 scenarios and 400 unique test items. We collected moral permissibility and intention judgments from human participants for a subset of our items and compared these judgments to those from two language models (GPT-4 and Claude-2) across eight conditions. We find that moral dilemmas in which the harm is a necessary means (as compared to a side effect) resulted in lower permissibility and higher intention ratings for both participants and language models. The same pattern was observed for evitable versus inevitable harmful outcomes. However, there was no clear effect of whether the harm resulted from an agent's action versus from having omitted to act. We discuss limitations of our prompt generation pipeline and opportunities for improving scenarios to increase the strength of experimental effects.
Related papers
- Do Language Models Understand Morality? Towards a Robust Detection of Moral Content [4.096453902709292]
We introduce novel systems that leverage abstract concepts and common-sense knowledge.
By doing so, we aim to develop versatile and robust methods for detecting moral values in real-world scenarios.
arXiv Detail & Related papers (2024-06-06T15:08:16Z) - What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts
and Rationales for Disambiguating Defeasible Social and Moral Situations [48.686872351114964]
Moral or ethical judgments rely heavily on the specific contexts in which they occur.
We introduce defeasible moral reasoning: a task to provide grounded contexts that make an action more or less morally acceptable.
We distill a high-quality dataset of 1.2M entries of contextualizations and rationales for 115K defeasible moral actions.
arXiv Detail & Related papers (2023-10-24T00:51:29Z) - Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems.
Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality.
This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z) - Evaluating Shutdown Avoidance of Language Models in Textual Scenarios [3.265773263570237]
We investigate the potential of using toy scenarios to evaluate instrumental reasoning and shutdown avoidance in language models such as GPT-4 and Claude.
We evaluate behaviours manually and also experimented with using language models for automatic evaluations.
This study provides insights into the behaviour of language models in shutdown avoidance scenarios and inspires further research on the use of textual scenarios for evaluations.
arXiv Detail & Related papers (2023-07-03T07:05:59Z) - The Capacity for Moral Self-Correction in Large Language Models [17.865286693602656]
We test the hypothesis that language models trained with reinforcement learning from human feedback have the capability to "morally self-correct"
We find strong evidence in support of this hypothesis across three different experiments.
We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles.
arXiv Detail & Related papers (2023-02-15T04:25:40Z) - Language Generation Models Can Cause Harm: So What Can We Do About It?
An Actionable Survey [50.58063811745676]
This work provides a survey of practical methods for addressing potential threats and societal harms from language generation models.
We draw on several prior works' of language model risks to present a structured overview of strategies for detecting and ameliorating different kinds of risks/harms of language generators.
arXiv Detail & Related papers (2022-10-14T10:43:39Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - Ethical-Advice Taker: Do Language Models Understand Natural Language
Interventions? [62.74872383104381]
We investigate the effectiveness of natural language interventions for reading-comprehension systems.
We propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model's unethical behavior.
arXiv Detail & Related papers (2021-06-02T20:57:58Z) - Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life
Anecdotes [72.64975113835018]
Motivated by descriptive ethics, we investigate a novel, data-driven approach to machine ethics.
We introduce Scruples, the first large-scale dataset with 625,000 ethical judgments over 32,000 real-life anecdotes.
Our dataset presents a major challenge to state-of-the-art neural language models, leaving significant room for improvement.
arXiv Detail & Related papers (2020-08-20T17:34:15Z) - Enabling Morally Sensitive Robotic Clarification Requests [2.4505259300326334]
reflexive generation of clarification requests can lead robots to miscommunicate their moral dispositions.
We present a solution by performing moral reasoning on each potential disambiguation of an ambiguous human utterance.
We then evaluate our method with a human subjects experiment, the results of which indicate that our approach successfully ameliorates the two identified concerns.
arXiv Detail & Related papers (2020-07-16T22:12:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.