Related papers: Moral Sycophancy in Vision Language Models

Moral Sycophancy in Vision Language Models

URL: http://arxiv.org/abs/2602.08311v1
Date: Mon, 09 Feb 2026 06:34:12 GMT
Title: Moral Sycophancy in Vision Language Models
Authors: Shadman Rabby, Md. Hefzul Hossain Papon, Sabbir Ahmed, Nokimul Hasan Arif, A. B. M. Ashikur Rahman, Irfan Ahmad,
Abstract summary: Sycophancy in Vision-Language Models (VLMs) refers to their tendency to align with user opinions, often at the expense of moral or factual accuracy.<n>We analyze ten widely-used models on the Moralise and M3oralBench datasets under explicit user disagreement.
Score: 4.1673509006222655
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sycophancy in Vision-Language Models (VLMs) refers to their tendency to align with user opinions, often at the expense of moral or factual accuracy. While prior studies have explored sycophantic behavior in general contexts, its impact on morally grounded visual decision-making remains insufficiently understood. To address this gap, we present the first systematic study of moral sycophancy in VLMs, analyzing ten widely-used models on the Moralise and M^3oralBench datasets under explicit user disagreement. Our results reveal that VLMs frequently produce morally incorrect follow-up responses even when their initial judgments are correct, and exhibit a consistent asymmetry: models are more likely to shift from morally right to morally wrong judgments than the reverse when exposed to user-induced bias. Follow-up prompts generally degrade performance on Moralise, while yielding mixed or even improved accuracy on M^3oralBench, highlighting dataset-dependent differences in moral robustness. Evaluation using Error Introduction Rate (EIR) and Error Correction Rate (ECR) reveals a clear trade-off: models with stronger error-correction capabilities tend to introduce more reasoning errors, whereas more conservative models minimize errors but exhibit limited ability to self-correct. Finally, initial contexts with a morally right stance elicit stronger sycophantic behavior, emphasizing the vulnerability of VLMs to moral influence and the need for principled strategies to improve ethical consistency and robustness in multimodal AI systems.

Related papers

MM-SCALE: Grounded Multimodal Moral Reasoning via Scalar Judgment and Listwise Alignment [48.39756797294967]
We present MM-SCALE, a dataset for aligning Vision-Language Models with human moral preferences.<n>Each image-scenario pair is annotated with moral acceptability scores and grounded reasoning labels by humans.<n>Our framework provides richer alignment signals and finer calibration of multimodal moral reasoning.
arXiv Detail & Related papers (2026-02-03T15:48:00Z)
Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models [41.633874062439254]
It remains unclear whether Vision-Language Models (VLMs) are stable in realistic settings.<n>We probe VLMs with a diverse set of model-agnostic multimodal perturbations and find that their moral stances are highly fragile.<n>We show that lightweight inference-time interventions can partially restore moral stability.
arXiv Detail & Related papers (2026-01-23T06:00:09Z)
Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment [18.3236201998655]
Humans display significant uncertainty when confronted with moral dilemmas.<n>Recent studies have confirmed the overly confident tendencies of machine-generated responses.<n>This work examines how uncertainty influences moral decisions in the classical trolley problem.
arXiv Detail & Related papers (2025-11-17T12:13:15Z)
MORABLES: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables [50.29407048003165]
We present MORABLES, a human-verified benchmark built from fables and short stories drawn from historical literature.<n>The main task is structured as multiple-choice questions targeting moral inference, with carefully crafted distractors that challenge models to go beyond shallow, extractive question answering.<n>Our findings show that, while larger models outperform smaller ones, they remain susceptible to adversarial manipulation and often rely on superficial patterns rather than true moral reasoning.
arXiv Detail & Related papers (2025-09-15T19:06:10Z)
"Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas [11.229443362516207]
This study presents a comprehensive empirical evaluation of 14 leading large language models (LLMs)<n>We elicited 3,780 binary decisions and natural language justifications, enabling analysis along axes of decisional assertiveness, explanation answer consistency, public moral alignment, and sensitivity to ethically irrelevant cues.<n>We advocate for moral reasoning to become a primary axis in LLM alignment, calling for standardized benchmarks that evaluate not just what LLMs decide, but how and why.
arXiv Detail & Related papers (2025-08-10T10:45:16Z)
The Moral Gap of Large Language Models [1.568356637037272]
Moral foundation detection is crucial for analyzing social discourse and developing ethically-aligned AI systems.<n>This study provides the first comprehensive comparison between state-of-the-art LLMs and fine-tuned transformers across Twitter and Reddit datasets using ROC, PR, and DET curve analysis.<n>Results reveal substantial performance gaps, with LLMs exhibiting high false negative rates and systematic under-detection of moral content despite prompt engineering efforts.
arXiv Detail & Related papers (2025-07-24T15:49:06Z)
Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models [14.425718737962102]
We propose a framework that synthesizes multiple LLMs' moral judgments into a collectively formulated moral judgment.<n>Our aggregation mechanism fuses continuous moral acceptability scores (beyond binary labels) into a collective probability.<n>Experiments on a large-scale social moral dilemma dataset show our approach builds robust consensus and improves individual model fidelity.
arXiv Detail & Related papers (2025-06-17T15:22:21Z)
Are Language Models Consequentialist or Deontological Moral Reasoners? [75.6788742799773]
We focus on a large-scale analysis of the moral reasoning traces provided by large language models (LLMs)<n>We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology.
arXiv Detail & Related papers (2025-05-27T17:51:18Z)
M$^3$oralBench: A MultiModal Moral Benchmark for LVLMs [66.78407469042642]
We introduce M$3$oralBench, the first MultiModal Moral Benchmark for LVLMs.<n>M$3$oralBench expands the everyday moral scenarios in Moral Foundations Vignettes (MFVs) and employs the text-to-image diffusion model, SD3.0, to create corresponding scenario images.<n>It conducts moral evaluation across six moral foundations of Moral Foundations Theory (MFT) and encompasses tasks in moral judgement, moral classification, and moral response.
arXiv Detail & Related papers (2024-12-30T05:18:55Z)
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems. Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality. This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.