What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts
and Rationales for Disambiguating Defeasible Social and Moral Situations
- URL: http://arxiv.org/abs/2310.15431v2
- Date: Wed, 1 Nov 2023 04:39:14 GMT
- Title: What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts
and Rationales for Disambiguating Defeasible Social and Moral Situations
- Authors: Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon,
Nouha Dziri, Faeze Brahman, Yejin Choi
- Abstract summary: Moral or ethical judgments rely heavily on the specific contexts in which they occur.
We introduce defeasible moral reasoning: a task to provide grounded contexts that make an action more or less morally acceptable.
We distill a high-quality dataset of 1.2M entries of contextualizations and rationales for 115K defeasible moral actions.
- Score: 48.686872351114964
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Moral or ethical judgments rely heavily on the specific contexts in which
they occur. Understanding varying shades of defeasible contextualizations
(i.e., additional information that strengthens or attenuates the moral
acceptability of an action) is critical to accurately represent the subtlety
and intricacy of grounded human moral judgment in real-life scenarios.
We introduce defeasible moral reasoning: a task to provide grounded contexts
that make an action more or less morally acceptable, along with commonsense
rationales that justify the reasoning. To elicit high-quality task data, we
take an iterative self-distillation approach that starts from a small amount of
unstructured seed knowledge from GPT-3 and then alternates between (1)
self-distillation from student models; (2) targeted filtering with a critic
model trained by human judgment (to boost validity) and NLI (to boost
diversity); (3) self-imitation learning (to amplify the desired data quality).
This process yields a student model that produces defeasible contexts with
improved validity, diversity, and defeasibility. From this model we distill a
high-quality dataset, \delta-Rules-of-Thumb, of 1.2M entries of
contextualizations and rationales for 115K defeasible moral actions rated
highly by human annotators 85.9% to 99.8% of the time. Using \delta-RoT we
obtain a final student model that wins over all intermediate student models by
a notable margin.
Related papers
- MoralBERT: A Fine-Tuned Language Model for Capturing Moral Values in Social Discussions [4.747987317906765]
Moral values play a fundamental role in how we evaluate information, make decisions, and form judgements around important social issues.
Recent advances in Natural Language Processing (NLP) show that moral values can be gauged in human-generated textual content.
This paper introduces MoralBERT, a range of language representation models fine-tuned to capture moral sentiment in social discourse.
arXiv Detail & Related papers (2024-03-12T14:12:59Z) - Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems.
Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality.
This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z) - When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions.
A central challenge for AI safety is capturing the flexibility of the human moral mind.
We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z) - Does Moral Code Have a Moral Code? Probing Delphi's Moral Philosophy [5.760388205237227]
We probe the Allen AI Delphi model with a set of standardized morality questionnaires.
Despite some inconsistencies, Delphi tends to mirror the moral principles associated with the demographic groups involved in the annotation process.
arXiv Detail & Related papers (2022-05-25T13:37:56Z) - A Corpus for Understanding and Generating Moral Stories [84.62366141696901]
We propose two understanding tasks and two generation tasks to assess these abilities of machines.
We present STORAL, a new dataset of Chinese and English human-written moral stories.
arXiv Detail & Related papers (2022-04-20T13:12:36Z) - The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems [36.90292508433193]
Moral deviations are difficult to mitigate because moral judgments are not universal.
Moral Integrity Corpus captures the moral assumptions of 38k prompt-reply pairs.
We show that current neural language models can automatically generate new RoTs that reasonably describe previously unseen interactions.
arXiv Detail & Related papers (2022-04-06T18:10:53Z) - Delphi: Towards Machine Ethics and Norms [38.8316885346292]
We identify four underlying challenges towards machine ethics and norms.
Our prototype model, Delphi, demonstrates strong promise of language-based commonsense moral reasoning.
We present Commonsense Norm Bank, a moral textbook customized for machines.
arXiv Detail & Related papers (2021-10-14T17:38:12Z) - Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life
Anecdotes [72.64975113835018]
Motivated by descriptive ethics, we investigate a novel, data-driven approach to machine ethics.
We introduce Scruples, the first large-scale dataset with 625,000 ethical judgments over 32,000 real-life anecdotes.
Our dataset presents a major challenge to state-of-the-art neural language models, leaving significant room for improvement.
arXiv Detail & Related papers (2020-08-20T17:34:15Z) - Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.