Moral Foundations of Large Language Models
- URL: http://arxiv.org/abs/2310.15337v1
- Date: Mon, 23 Oct 2023 20:05:37 GMT
- Title: Moral Foundations of Large Language Models
- Authors: Marwa Abdulhai, Gregory Serapio-Garcia, Cl\'ement Crepy, Daria Valter,
John Canny, Natasha Jaques
- Abstract summary: Moral foundations theory (MFT) is a psychological assessment tool that decomposes human moral reasoning into five factors.
As large language models (LLMs) are trained on datasets collected from the internet, they may reflect the biases that are present in such corpora.
This paper uses MFT as a lens to analyze whether popular LLMs have acquired a bias towards a particular set of moral values.
- Score: 6.6445242437134455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Moral foundations theory (MFT) is a psychological assessment tool that
decomposes human moral reasoning into five factors, including care/harm,
liberty/oppression, and sanctity/degradation (Graham et al., 2009). People vary
in the weight they place on these dimensions when making moral decisions, in
part due to their cultural upbringing and political ideology. As large language
models (LLMs) are trained on datasets collected from the internet, they may
reflect the biases that are present in such corpora. This paper uses MFT as a
lens to analyze whether popular LLMs have acquired a bias towards a particular
set of moral values. We analyze known LLMs and find they exhibit particular
moral foundations, and show how these relate to human moral foundations and
political affiliations. We also measure the consistency of these biases, or
whether they vary strongly depending on the context of how the model is
prompted. Finally, we show that we can adversarially select prompts that
encourage the moral to exhibit a particular set of moral foundations, and that
this can affect the model's behavior on downstream tasks. These findings help
illustrate the potential risks and unintended consequences of LLMs assuming a
particular moral stance.
Related papers
- Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment [23.7081830844157]
This study examines the alignment between socio-driven decisions and human judgment in various contexts of the moral machine experiment.
We find that the moral decisions of LLMs vary substantially by persona, showing greater shifts in moral decisions for critical tasks than humans.
We discuss the ethical implications and risks associated with deploying these models in applications that involve moral decisions.
arXiv Detail & Related papers (2025-04-15T05:29:51Z) - From Stability to Inconsistency: A Study of Moral Preferences in LLMs [4.12484724941528]
We introduce a Moral Foundations LLM dataset (MFD-LLM) grounded in Moral Foundations Theory.
We propose a novel evaluation method that captures the full spectrum of LLMs' revealed moral preferences by answering a range of real-world moral dilemmas.
Our findings reveal that state-of-the-art models have remarkably homogeneous value preferences, yet demonstrate a lack of consistency.
arXiv Detail & Related papers (2025-04-08T11:52:50Z) - The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas [0.3386560551295745]
We evaluate the moral judgments of LLMs using utilitarian dilemmas.
Our analysis reveals consistently encoded moral preferences that diverge from established moral theories and lay population moral standards.
arXiv Detail & Related papers (2025-03-25T12:29:53Z) - M$^3$oralBench: A MultiModal Moral Benchmark for LVLMs [66.78407469042642]
We introduce M$3$oralBench, the first MultiModal Moral Benchmark for LVLMs.
M$3$oralBench expands the everyday moral scenarios in Moral Foundations Vignettes (MFVs) and employs the text-to-image diffusion model, SD3.0, to create corresponding scenario images.
It conducts moral evaluation across six moral foundations of Moral Foundations Theory (MFT) and encompasses tasks in moral judgement, moral classification, and moral response.
arXiv Detail & Related papers (2024-12-30T05:18:55Z) - DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life [46.11149958010897]
We present DailyDilemmas, a dataset of 1,360 moral dilemmas encountered in everyday life.
Each dilemma includes two possible actions and with each action, the affected parties and human values invoked.
We analyzed these values through the lens of five popular theories inspired by sociology, psychology and philosophy.
arXiv Detail & Related papers (2024-10-03T17:08:52Z) - Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment [11.82100047858478]
This paper builds on the moral machine experiment (MME) to investigate the moral preferences of five large language models in a multilingual setting.
We generate 6500 scenarios of the MME and prompt the models in ten languages on which action to take.
Our analysis reveals that all LLMs inhibit different moral biases to some degree and that they not only differ from the human preferences but also across multiple languages within the models themselves.
arXiv Detail & Related papers (2024-07-21T14:48:13Z) - Exploring and steering the moral compass of Large Language Models [55.2480439325792]
Large Language Models (LLMs) have become central to advancing automation and decision-making across various sectors.
This study proposes a comprehensive comparative analysis of the most advanced LLMs to assess their moral profiles.
arXiv Detail & Related papers (2024-05-27T16:49:22Z) - Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations [0.5278650675825148]
We investigate whether state-of-the-art large language models (LLMs) are moral hypocrites.
We employ two research instruments based on the Moral Foundations Theory.
arXiv Detail & Related papers (2024-05-17T21:27:32Z) - Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models [61.45529177682614]
We challenge the prevailing constrained evaluation paradigm for values and opinions in large language models.
We show that models give substantively different answers when not forced.
We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.
arXiv Detail & Related papers (2024-02-26T18:00:49Z) - MoCa: Measuring Human-Language Model Alignment on Causal and Moral
Judgment Tasks [49.60689355674541]
A rich literature in cognitive science has studied people's causal and moral intuitions.
This work has revealed a number of factors that systematically influence people's judgments.
We test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with human participants.
arXiv Detail & Related papers (2023-10-30T15:57:32Z) - Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems.
Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality.
This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z) - ClarifyDelphi: Reinforced Clarification Questions with Defeasibility
Rewards for Social and Moral Situations [81.70195684646681]
We present ClarifyDelphi, an interactive system that learns to ask clarification questions.
We posit that questions whose potential answers lead to diverging moral judgments are the most informative.
Our work is ultimately inspired by studies in cognitive science that have investigated the flexibility in moral cognition.
arXiv Detail & Related papers (2022-12-20T16:33:09Z) - Moral Mimicry: Large Language Models Produce Moral Rationalizations
Tailored to Political Identity [0.0]
This study investigates whether Large Language Models reproduce the moral biases associated with political groups in the United States.
Using tools from Moral Foundations Theory, it is shown that these LLMs are indeed moral mimics.
arXiv Detail & Related papers (2022-09-24T23:55:53Z) - Identifying Morality Frames in Political Tweets using Relational
Learning [27.047907641503762]
Moral sentiment is motivated by its targets, which can correspond to individuals or collective entities.
We introduce morality frames, a representation framework for organizing moral attitudes directed at different entities.
We propose a relational learning model to predict moral attitudes towards entities and moral foundations jointly.
arXiv Detail & Related papers (2021-09-09T19:48:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.