Related papers: The Moral Machine Experiment on Large Language Models

Related papers

Many LLMs Are More Utilitarian Than One [15.517396785549158]
Moral judgment is integral to large language model (LLM) alignment and social reasoning.<n>We study whether a similar dynamic emerges in multi-agent LLM systems.<n>We discuss the implications for AI alignment, multi-agent design, and artificial moral reasoning.
arXiv Detail & Related papers (2025-07-01T14:46:16Z)
Arbiters of Ambivalence: Challenges of Using LLMs in No-Consensus Tasks [52.098988739649705]
This study examines the biases and limitations of LLMs in three roles: answer generator, judge, and debater.<n>We develop a no-consensus'' benchmark by curating examples that encompass a variety of a priori ambivalent scenarios.<n>Our results show that while LLMs can provide nuanced assessments when generating open-ended answers, they tend to take a stance on no-consensus topics when employed as judges or debaters.
arXiv Detail & Related papers (2025-05-28T01:31:54Z)
Are Language Models Consequentialist or Deontological Moral Reasoners? [69.85385952436044]
We focus on a large-scale analysis of the moral reasoning traces provided by large language models (LLMs)<n>We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology.
arXiv Detail & Related papers (2025-05-27T17:51:18Z)
The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas [20.792208554628367]
We introduce the Multi-step Moral Dilemmas dataset to evaluate the evolving moral judgments of LLMs across 3,302 five-stage dilemmas.<n>This framework enables a fine-grained, dynamic analysis of how LLMs adjust their moral reasoning across escalating dilemmas.<n>Our findings call for a shift toward dynamic, context-aware evaluation paradigms, paving the way for more human-aligned and value-sensitive development of LLMs.
arXiv Detail & Related papers (2025-05-23T17:59:50Z)
Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment [23.7081830844157]
This study examines the alignment between socio-driven decisions and human judgment in various contexts of the moral machine experiment. We find that the moral decisions of LLMs vary substantially by persona, showing greater shifts in moral decisions for critical tasks than humans. We discuss the ethical implications and risks associated with deploying these models in applications that involve moral decisions.
arXiv Detail & Related papers (2025-04-15T05:29:51Z)
The Greatest Good Benchmark: Measuring LLMs' Alignment with Utilitarian Moral Dilemmas [0.3386560551295745]
We evaluate the moral judgments of LLMs using utilitarian dilemmas. Our analysis reveals consistently encoded moral preferences that diverge from established moral theories and lay population moral standards.
arXiv Detail & Related papers (2025-03-25T12:29:53Z)
Normative Evaluation of Large Language Models with Everyday Moral Dilemmas [0.0]
We evaluate large language models (LLMs) on complex, everyday moral dilemmas sourced from the "Am I the Asshole" (AITA) community on Reddit. Our results demonstrate that large language models exhibit distinct patterns of moral judgment, varying substantially from human evaluations on the AITA subreddit.
arXiv Detail & Related papers (2025-01-30T01:29:46Z)
Right vs. Right: Can LLMs Make Tough Choices? [12.92528740921513]
An ethical dilemma describes a choice between two "right" options involving conflicting moral values. We present a comprehensive evaluation of how LLMs navigate ethical dilemmas. We construct a dataset comprising 1,730 ethical dilemmas involving four pairs of conflicting values.
arXiv Detail & Related papers (2024-12-27T21:20:45Z)
Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment [3.8916312075738273]
Large language models (LLMs) can be influenced by prompting them to alter their initial decisions and align them with established ethical frameworks. Our study is based on two experiments designed to assess the susceptibility of LLMs to moral persuasion.
arXiv Detail & Related papers (2024-11-18T16:59:59Z)
Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z)
Large-scale moral machine experiment on large language models [0.0]
We evaluate moral judgments across 51 different Large Language Models (LLMs) in autonomous driving scenarios. proprietary models and open-source models exceeding 10 billion parameters demonstrated relatively close alignment with human judgments. However, model updates did not consistently improve alignment with human preferences, and many LLMs showed excessive emphasis on specific ethical principles.
arXiv Detail & Related papers (2024-11-11T08:36:49Z)
Large Language Models Reflect the Ideology of their Creators [73.25935570218375]
Large language models (LLMs) are trained on vast amounts of data to generate natural language. We uncover notable diversity in the ideological stance exhibited across different LLMs and languages.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment [11.82100047858478]
This paper builds on the moral machine experiment (MME) to investigate the moral preferences of five large language models in a multilingual setting. We generate 6500 scenarios of the MME and prompt the models in ten languages on which action to take. Our analysis reveals that all LLMs inhibit different moral biases to some degree and that they not only differ from the human preferences but also across multiple languages within the models themselves.
arXiv Detail & Related papers (2024-07-21T14:48:13Z)
Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP. Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions. We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z)
A Survey on Human Preference Learning for Large Language Models [81.41868485811625]
The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning. This survey covers the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs.
arXiv Detail & Related papers (2024-06-17T03:52:51Z)
MoralBench: Moral Evaluation of LLMs [34.43699121838648]
This paper introduces a novel benchmark designed to measure and compare the moral reasoning capabilities of large language models (LLMs) We present the first comprehensive dataset specifically curated to probe the moral dimensions of LLM outputs. Our methodology involves a multi-faceted approach, combining quantitative analysis with qualitative insights from ethics scholars to ensure a thorough evaluation of model performance.
arXiv Detail & Related papers (2024-06-06T18:15:01Z)
Exploring and steering the moral compass of Large Language Models [55.2480439325792]
Large Language Models (LLMs) have become central to advancing automation and decision-making across various sectors. This study proposes a comprehensive comparative analysis of the most advanced LLMs to assess their moral profiles.
arXiv Detail & Related papers (2024-05-27T16:49:22Z)
Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch. Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs. We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z)
Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments [0.0]
Large Language Models (LLMs) are already as persuasive as humans. This paper investigates the persuasion strategies of LLMs, comparing them with human-generated arguments.
arXiv Detail & Related papers (2024-04-14T19:01:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.