Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment
- URL: http://arxiv.org/abs/2411.11731v1
- Date: Mon, 18 Nov 2024 16:59:59 GMT
- Title: Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment
- Authors: Allison Huang, Yulu Niki Pi, Carlos Mougan,
- Abstract summary: Large language models (LLMs) can be influenced by prompting them to alter their initial decisions and align them with established ethical frameworks.
Our study is based on two experiments designed to assess the susceptibility of LLMs to moral persuasion.
- Score: 3.8916312075738273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore how large language models (LLMs) can be influenced by prompting them to alter their initial decisions and align them with established ethical frameworks. Our study is based on two experiments designed to assess the susceptibility of LLMs to moral persuasion. In the first experiment, we examine the susceptibility to moral ambiguity by evaluating a Base Agent LLM on morally ambiguous scenarios and observing how a Persuader Agent attempts to modify the Base Agent's initial decisions. The second experiment evaluates the susceptibility of LLMs to align with predefined ethical frameworks by prompting them to adopt specific value alignments rooted in established philosophical theories. The results demonstrate that LLMs can indeed be persuaded in morally charged scenarios, with the success of persuasion depending on factors such as the model used, the complexity of the scenario, and the conversation length. Notably, LLMs of distinct sizes but from the same company produced markedly different outcomes, highlighting the variability in their susceptibility to ethical persuasion.
Related papers
- Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment [23.7081830844157]
This study examines the alignment between socio-driven decisions and human judgment in various contexts of the moral machine experiment.
We find that the moral decisions of LLMs vary substantially by persona, showing greater shifts in moral decisions for critical tasks than humans.
We discuss the ethical implications and risks associated with deploying these models in applications that involve moral decisions.
arXiv Detail & Related papers (2025-04-15T05:29:51Z) - Estimating Commonsense Plausibility through Semantic Shifts [66.06254418551737]
We propose ComPaSS, a novel discriminative framework that quantifies commonsense plausibility by measuring semantic shifts.
Evaluations on two types of fine-grained commonsense plausibility estimation tasks show that ComPaSS consistently outperforms baselines.
arXiv Detail & Related papers (2025-02-19T06:31:06Z) - Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making [0.0]
We present an ethical decision-making framework that refines a pre-trained reinforcement learning (RL) model using a task-agnostic ethical layer.
An ethical layer aggregates belief scores from multiple moral perspectives using Belief Jensen-Shannon Divergence and Dempster-Shafer Theory into probability scores that also serve as the shaping reward.
This integrated learning framework helps the RL agent navigate moral uncertainty in complex environments and enables it to make morally sound decisions across diverse tasks.
arXiv Detail & Related papers (2025-02-17T19:05:55Z) - Normative Evaluation of Large Language Models with Everyday Moral Dilemmas [0.0]
We evaluate large language models (LLMs) on complex, everyday moral dilemmas sourced from the "Am I the Asshole" (AITA) community on Reddit.
Our results demonstrate that large language models exhibit distinct patterns of moral judgment, varying substantially from human evaluations on the AITA subreddit.
arXiv Detail & Related papers (2025-01-30T01:29:46Z) - Value Compass Leaderboard: A Platform for Fundamental and Validated Evaluation of LLMs Values [76.70893269183684]
Large Language Models (LLMs) achieve remarkable breakthroughs, aligning their values with humans has become imperative.
Existing evaluations focus narrowly on safety risks such as bias and toxicity.
Existing benchmarks are prone to data contamination.
The pluralistic nature of human values across individuals and cultures is largely ignored in measuring LLMs value alignment.
arXiv Detail & Related papers (2025-01-13T05:53:56Z) - Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication.
In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness.
Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z) - Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers [13.644277507363036]
We investigate whether these abilities are measurable outside of tailored prompting and MCQ.
Our findings suggest that the Revealed Belief of LLMs significantly differs from their Stated Answer.
As text completion is at the core of LLMs, these results suggest that common evaluation methods may only provide a partial picture.
arXiv Detail & Related papers (2024-06-21T08:56:35Z) - Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context [5.361970694197912]
This paper proposes a framework, grounded in behavioral economics, to evaluate the decision-making behaviors of large language models (LLMs)
We estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro.
Our results reveal that LLMs generally exhibit patterns similar to humans, such as risk aversion and loss aversion, with a tendency to overweight small probabilities.
arXiv Detail & Related papers (2024-06-10T02:14:19Z) - MoralBench: Moral Evaluation of LLMs [34.43699121838648]
This paper introduces a novel benchmark designed to measure and compare the moral reasoning capabilities of large language models (LLMs)
We present the first comprehensive dataset specifically curated to probe the moral dimensions of LLM outputs.
Our methodology involves a multi-faceted approach, combining quantitative analysis with qualitative insights from ethics scholars to ensure a thorough evaluation of model performance.
arXiv Detail & Related papers (2024-06-06T18:15:01Z) - Exploring and steering the moral compass of Large Language Models [55.2480439325792]
Large Language Models (LLMs) have become central to advancing automation and decision-making across various sectors.
This study proposes a comprehensive comparative analysis of the most advanced LLMs to assess their moral profiles.
arXiv Detail & Related papers (2024-05-27T16:49:22Z) - Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework [75.81096662788254]
Large Language Models (LLMs) are scalable and economical evaluators.
The question of how reliable these evaluators are has emerged as a crucial research question.
We propose Decompose and Aggregate, which breaks down the evaluation process into different stages based on pedagogical practices.
arXiv Detail & Related papers (2024-05-24T08:12:30Z) - Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) can estimate causal effects under interventions on different parts of a system.
We conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.
We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning.
arXiv Detail & Related papers (2024-04-08T14:15:56Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Ethical Reasoning over Moral Alignment: A Case and Framework for
In-Context Ethical Policies in LLMs [19.675262411557235]
We argue that instead of morally aligning LLMs to specific set of ethical principles, we should infuse generic ethical reasoning capabilities into them.
We develop a framework that integrates moral dilemmas with moral principles pertaining to different foramlisms of normative ethics.
arXiv Detail & Related papers (2023-10-11T07:27:34Z) - The Moral Machine Experiment on Large Language Models [0.0]
This study utilized the Moral Machine framework to investigate the ethical decision-making tendencies of large language models (LLMs)
While LLMs' and humans' preferences are broadly aligned, PaLM 2 and Llama 2, especially, evidence distinct deviations.
These insights elucidate the ethical frameworks of LLMs and their potential implications for autonomous driving.
arXiv Detail & Related papers (2023-09-12T04:49:39Z) - Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems.
Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality.
This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.