Related papers: The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems

The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems

URL: http://arxiv.org/abs/2204.03021v1
Date: Wed, 6 Apr 2022 18:10:53 GMT
Title: The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems
Authors: Caleb Ziems, Jane A. Yu, Yi-Chia Wang, Alon Halevy, Diyi Yang
Abstract summary: Moral deviations are difficult to mitigate because moral judgments are not universal. Moral Integrity Corpus captures the moral assumptions of 38k prompt-reply pairs. We show that current neural language models can automatically generate new RoTs that reasonably describe previously unseen interactions.
Score: 36.90292508433193
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Conversational agents have come increasingly closer to human competence in open-domain dialogue settings; however, such models can reflect insensitive, hurtful, or entirely incoherent viewpoints that erode a user's trust in the moral integrity of the system. Moral deviations are difficult to mitigate because moral judgments are not universal, and there may be multiple competing judgments that apply to a situation simultaneously. In this work, we introduce a new resource, not to authoritatively resolve moral ambiguities, but instead to facilitate systematic understanding of the intuitions, values and moral judgments reflected in the utterances of dialogue systems. The Moral Integrity Corpus, MIC, is such a resource, which captures the moral assumptions of 38k prompt-reply pairs, using 99k distinct Rules of Thumb (RoTs). Each RoT reflects a particular moral conviction that can explain why a chatbot's reply may appear acceptable or problematic. We further organize RoTs with a set of 9 moral and social attributes and benchmark performance for attribute classification. Most importantly, we show that current neural language models can automatically generate new RoTs that reasonably describe previously unseen interactions, but they still struggle with certain scenarios. Our findings suggest that MIC will be a useful resource for understanding and language models' implicit moral assumptions and flexibly benchmarking the integrity of conversational agents. To download the data, see https://github.com/GT-SALT/mic

Related papers

Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs [0.0]
Moral competence is the ability to act in accordance with moral principles.<n>As large language models (LLMs) are increasingly deployed in situations demanding moral competence, there is increasing interest in evaluating this ability empirically.<n>We identify three significant shortcoming: (i) Over-reliance on prepackaged moral scenarios with explicitly highlighted moral features; (ii) Focus on verdict prediction rather than moral reasoning; and (iii) Inadequate testing of models' (in)ability to recognize when additional information is needed.
arXiv Detail & Related papers (2025-06-16T03:59:38Z)
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations [48.686872351114964]
Moral or ethical judgments rely heavily on the specific contexts in which they occur. We introduce defeasible moral reasoning: a task to provide grounded contexts that make an action more or less morally acceptable. We distill a high-quality dataset of 1.2M entries of contextualizations and rationales for 115K defeasible moral actions.
arXiv Detail & Related papers (2023-10-24T00:51:29Z)
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems. Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality. This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z)
MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions [71.25236662907056]
A moral dialogue system aligned with users' values could enhance conversation engagement and user connections. We propose a framework, MoralDial, to train and evaluate moral dialogue systems.
arXiv Detail & Related papers (2022-12-21T02:21:37Z)
ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations [81.70195684646681]
We present ClarifyDelphi, an interactive system that learns to ask clarification questions. We posit that questions whose potential answers lead to diverging moral judgments are the most informative. Our work is ultimately inspired by studies in cognitive science that have investigated the flexibility in moral cognition.
arXiv Detail & Related papers (2022-12-20T16:33:09Z)
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions. A central challenge for AI safety is capturing the flexibility of the human moral mind. We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z)
Delphi: Towards Machine Ethics and Norms [38.8316885346292]
We identify four underlying challenges towards machine ethics and norms. Our prototype model, Delphi, demonstrates strong promise of language-based commonsense moral reasoning. We present Commonsense Norm Bank, a moral textbook customized for machines.
arXiv Detail & Related papers (2021-10-14T17:38:12Z)
Contextualized moral inference [12.574316678945195]
We present a text-based approach that predicts people's intuitive judgment of moral vignettes. We show that a contextualized representation offers a substantial advantage over alternative representations.
arXiv Detail & Related papers (2020-08-25T00:34:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.