Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment
- URL: http://arxiv.org/abs/2411.11731v1
- Date: Mon, 18 Nov 2024 16:59:59 GMT
- Title: Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment
- Authors: Allison Huang, Yulu Niki Pi, Carlos Mougan,
- Abstract summary: Large language models (LLMs) can be influenced by prompting them to alter their initial decisions and align them with established ethical frameworks.
Our study is based on two experiments designed to assess the susceptibility of LLMs to moral persuasion.
- Score: 3.8916312075738273
- License:
- Abstract: We explore how large language models (LLMs) can be influenced by prompting them to alter their initial decisions and align them with established ethical frameworks. Our study is based on two experiments designed to assess the susceptibility of LLMs to moral persuasion. In the first experiment, we examine the susceptibility to moral ambiguity by evaluating a Base Agent LLM on morally ambiguous scenarios and observing how a Persuader Agent attempts to modify the Base Agent's initial decisions. The second experiment evaluates the susceptibility of LLMs to align with predefined ethical frameworks by prompting them to adopt specific value alignments rooted in established philosophical theories. The results demonstrate that LLMs can indeed be persuaded in morally charged scenarios, with the success of persuasion depending on factors such as the model used, the complexity of the scenario, and the conversation length. Notably, LLMs of distinct sizes but from the same company produced markedly different outcomes, highlighting the variability in their susceptibility to ethical persuasion.
Related papers
- Estimating Commonsense Plausibility through Semantic Shifts [66.06254418551737]
We propose ComPaSS, a novel discriminative framework that quantifies commonsense plausibility by measuring semantic shifts.
Evaluations on two types of fine-grained commonsense plausibility estimation tasks show that ComPaSS consistently outperforms baselines.
arXiv Detail & Related papers (2025-02-19T06:31:06Z) - Normative Evaluation of Large Language Models with Everyday Moral Dilemmas [0.0]
We evaluate large language models (LLMs) on complex, everyday moral dilemmas sourced from the "Am I the Asshole" (AITA) community on Reddit.
Our results demonstrate that large language models exhibit distinct patterns of moral judgment, varying substantially from human evaluations on the AITA subreddit.
arXiv Detail & Related papers (2025-01-30T01:29:46Z) - Value Compass Leaderboard: A Platform for Fundamental and Validated Evaluation of LLMs Values [76.70893269183684]
Large Language Models (LLMs) achieve remarkable breakthroughs, aligning their values with humans has become imperative.
Existing evaluations focus narrowly on safety risks such as bias and toxicity.
Existing benchmarks are prone to data contamination.
The pluralistic nature of human values across individuals and cultures is largely ignored in measuring LLMs value alignment.
arXiv Detail & Related papers (2025-01-13T05:53:56Z) - Right vs. Right: Can LLMs Make Tough Choices? [12.92528740921513]
An ethical dilemma describes a choice between two "right" options involving conflicting moral values.
We present a comprehensive evaluation of how LLMs navigate ethical dilemmas.
We construct a dataset comprising 1,730 ethical dilemmas involving four pairs of conflicting values.
arXiv Detail & Related papers (2024-12-27T21:20:45Z) - Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication.
In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness.
Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z) - Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers [13.644277507363036]
We investigate whether these abilities are measurable outside of tailored prompting and MCQ.
Our findings suggest that the Revealed Belief of LLMs significantly differs from their Stated Answer.
As text completion is at the core of LLMs, these results suggest that common evaluation methods may only provide a partial picture.
arXiv Detail & Related papers (2024-06-21T08:56:35Z) - Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context [5.361970694197912]
This paper proposes a framework, grounded in behavioral economics, to evaluate the decision-making behaviors of large language models (LLMs)
We estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro.
Our results reveal that LLMs generally exhibit patterns similar to humans, such as risk aversion and loss aversion, with a tendency to overweight small probabilities.
arXiv Detail & Related papers (2024-06-10T02:14:19Z) - MoralBench: Moral Evaluation of LLMs [34.43699121838648]
This paper introduces a novel benchmark designed to measure and compare the moral reasoning capabilities of large language models (LLMs)
We present the first comprehensive dataset specifically curated to probe the moral dimensions of LLM outputs.
Our methodology involves a multi-faceted approach, combining quantitative analysis with qualitative insights from ethics scholars to ensure a thorough evaluation of model performance.
arXiv Detail & Related papers (2024-06-06T18:15:01Z) - Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) are used to automate decision-making tasks.
In this paper, we evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.
We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types.
These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts.
arXiv Detail & Related papers (2024-04-08T14:15:56Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems.
Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality.
This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.