Related papers: The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making

Related papers

The Pluralistic Moral Gap: Understanding Judgment and Value Differences between Humans and Large Language Models [36.573147909548226]
People increasingly rely on Large Language Models (LLMs) for moral advice, which may influence humans' decisions.<n>We find that models reproduce human judgments only under high consensus; alignment deteriorates sharply when human disagreement increases.<n>To close this gap, we introduce Dynamic Moral Profiling (DMP), a Dirichlet-based sampling method that conditions model outputs on human-derived value profiles.
arXiv Detail & Related papers (2025-07-23T05:26:17Z)
How large language models judge and influence human cooperation [82.07571393247476]
We assess how state-of-the-art language models judge cooperative actions.<n>We observe a remarkable agreement in evaluating cooperation against good opponents.<n>We show that the differences revealed between models can significantly impact the prevalence of cooperation.
arXiv Detail & Related papers (2025-06-30T09:14:42Z)
Who Gets the Kidney? Human-AI Alignment, Indecision, and Moral Values [36.47201247038004]
We show that Large Language Models (LLMs) exhibit stark deviations from human values in prioritizing various attributes.<n>We show that low-rank supervised fine-tuning with few samples is often effective in improving both decision consistency and calibrating indecision modeling.
arXiv Detail & Related papers (2025-05-30T01:23:11Z)
Arbiters of Ambivalence: Challenges of Using LLMs in No-Consensus Tasks [52.098988739649705]
This study examines the biases and limitations of LLMs in three roles: answer generator, judge, and debater.<n>We develop a no-consensus'' benchmark by curating examples that encompass a variety of a priori ambivalent scenarios.<n>Our results show that while LLMs can provide nuanced assessments when generating open-ended answers, they tend to take a stance on no-consensus topics when employed as judges or debaters.
arXiv Detail & Related papers (2025-05-28T01:31:54Z)
Are Language Models Consequentialist or Deontological Moral Reasoners? [69.85385952436044]
We focus on a large-scale analysis of the moral reasoning traces provided by large language models (LLMs)<n>We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology.
arXiv Detail & Related papers (2025-05-27T17:51:18Z)
The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas [20.792208554628367]
We introduce the Multi-step Moral Dilemmas dataset to evaluate the evolving moral judgments of LLMs across 3,302 five-stage dilemmas.<n>This framework enables a fine-grained, dynamic analysis of how LLMs adjust their moral reasoning across escalating dilemmas.<n>Our findings call for a shift toward dynamic, context-aware evaluation paradigms, paving the way for more human-aligned and value-sensitive development of LLMs.
arXiv Detail & Related papers (2025-05-23T17:59:50Z)
Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment [23.7081830844157]
This study examines the alignment between socio-driven decisions and human judgment in various contexts of the moral machine experiment. We find that the moral decisions of LLMs vary substantially by persona, showing greater shifts in moral decisions for critical tasks than humans. We discuss the ethical implications and risks associated with deploying these models in applications that involve moral decisions.
arXiv Detail & Related papers (2025-04-15T05:29:51Z)
Measurement of LLM's Philosophies of Human Nature [113.47929131143766]
We design the standardized psychological scale specifically targeting large language models (LLM) We show that current LLMs exhibit a systemic lack of trust in humans. We propose a mental loop learning framework, which enables LLM to continuously optimize its value system.
arXiv Detail & Related papers (2025-04-03T06:22:19Z)
Normative Evaluation of Large Language Models with Everyday Moral Dilemmas [0.0]
We evaluate large language models (LLMs) on complex, everyday moral dilemmas sourced from the "Am I the Asshole" (AITA) community on Reddit. Our results demonstrate that large language models exhibit distinct patterns of moral judgment, varying substantially from human evaluations on the AITA subreddit.
arXiv Detail & Related papers (2025-01-30T01:29:46Z)
ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models [30.301864398780648]
We introduce a novel moral judgment approach called textitEthic that leverages LLMs' reasoning ability and contrastive learning to uncover relevant social norms. Our method outperforms state-of-the-art approaches in moral judgment tasks.
arXiv Detail & Related papers (2024-12-17T12:22:44Z)
Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment [3.8916312075738273]
Large language models (LLMs) can be influenced by prompting them to alter their initial decisions and align them with established ethical frameworks. Our study is based on two experiments designed to assess the susceptibility of LLMs to moral persuasion.
arXiv Detail & Related papers (2024-11-18T16:59:59Z)
Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP. Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions. We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z)
Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch. Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs. We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z)
Attributions toward Artificial Agents in a modified Moral Turing Test [0.6284264304179837]
We ask people to distinguish real human moral evaluations from those made by a popular advanced AI language model: GPT-4. A representative sample of 299 U.S. adults rated the AI's moral reasoning as superior in quality to humans' along almost all dimensions. The emergence of language models capable of producing moral responses perceived as superior in quality to humans' raises concerns that people may uncritically accept potentially harmful moral guidance from AI.
arXiv Detail & Related papers (2024-04-03T13:00:47Z)
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks [49.60689355674541]
A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments. We test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with human participants.
arXiv Detail & Related papers (2023-10-30T15:57:32Z)
The Moral Machine Experiment on Large Language Models [0.0]
This study utilized the Moral Machine framework to investigate the ethical decision-making tendencies of large language models (LLMs) While LLMs' and humans' preferences are broadly aligned, PaLM 2 and Llama 2, especially, evidence distinct deviations. These insights elucidate the ethical frameworks of LLMs and their potential implications for autonomous driving.
arXiv Detail & Related papers (2023-09-12T04:49:39Z)
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility [62.74405775089802]
We present CValues, the first Chinese human values evaluation benchmark to measure the alignment ability of LLMs. As a result, we have manually collected adversarial safety prompts across 10 scenarios and induced responsibility prompts from 8 domains. Our findings suggest that while most Chinese LLMs perform well in terms of safety, there is considerable room for improvement in terms of responsibility.
arXiv Detail & Related papers (2023-07-19T01:22:40Z)
Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans. We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z)
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions. This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision. We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z)
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning [4.2050490361120465]
A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents. We present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories. We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation.
arXiv Detail & Related papers (2023-01-20T09:36:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.