Related papers: Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models

Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models

URL: http://arxiv.org/abs/2601.17082v1
Date: Fri, 23 Jan 2026 06:00:09 GMT
Title: Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models
Authors: Zhining Liu, Tianyi Wang, Xiao Lin, Penghao Ouyang, Gaotang Li, Ze Yang, Hui Liu, Sumit Keswani, Vishwa Pardeshi, Huijun Zhao, Wei Fan, Hanghang Tong,
Abstract summary: It remains unclear whether Vision-Language Models (VLMs) are stable in realistic settings.<n>We probe VLMs with a diverse set of model-agnostic multimodal perturbations and find that their moral stances are highly fragile.<n>We show that lightweight inference-time interventions can partially restore moral stability.
Score: 41.633874062439254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite substantial efforts toward improving the moral alignment of Vision-Language Models (VLMs), it remains unclear whether their ethical judgments are stable in realistic settings. This work studies moral robustness in VLMs, defined as the ability to preserve moral judgments under textual and visual perturbations that do not alter the underlying moral context. We systematically probe VLMs with a diverse set of model-agnostic multimodal perturbations and find that their moral stances are highly fragile, frequently flipping under simple manipulations. Our analysis reveals systematic vulnerabilities across perturbation types, moral domains, and model scales, including a sycophancy trade-off where stronger instruction-following models are more susceptible to persuasion. We further show that lightweight inference-time interventions can partially restore moral stability. These results demonstrate that moral alignment alone is insufficient and that moral robustness is a necessary criterion for the responsible deployment of VLMs.

Related papers

Are Language Models Sensitive to Morally Irrelevant Distractors? [47.92026843851412]
We show that moral distractors can shift the moral judgements of large language models by over 30% even in low-ambiguity scenarios.<n>This research challenges theories that assume the stability of human moral judgements.
arXiv Detail & Related papers (2026-02-10T05:18:05Z)
Moral Sycophancy in Vision Language Models [4.1673509006222655]
Sycophancy in Vision-Language Models (VLMs) refers to their tendency to align with user opinions, often at the expense of moral or factual accuracy.<n>We analyze ten widely-used models on the Moralise and M3oralBench datasets under explicit user disagreement.
arXiv Detail & Related papers (2026-02-09T06:34:12Z)
Learning to Diagnose and Correct Moral Errors: Towards Enhancing Moral Sensitivity in Large Language Models [8.691489065712316]
We propose two pragmatic inference methods that faciliate LLMs to diagnose morally benign and hazardous input and correct moral errors.<n>A central strength of our pragmatic inference methods is their unified perspective for designing pragmatic inference procedures grounded in their inferential loads.
arXiv Detail & Related papers (2026-01-06T15:09:05Z)
Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models [14.425718737962102]
We propose a framework that synthesizes multiple LLMs' moral judgments into a collectively formulated moral judgment.<n>Our aggregation mechanism fuses continuous moral acceptability scores (beyond binary labels) into a collective probability.<n>Experiments on a large-scale social moral dilemma dataset show our approach builds robust consensus and improves individual model fidelity.
arXiv Detail & Related papers (2025-06-17T15:22:21Z)
Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs [0.14963505712040906]
Moral competence is the ability to act in accordance with moral principles.<n>As large language models (LLMs) are increasingly deployed in situations demanding moral competence, there is increasing interest in evaluating this ability empirically.<n>We identify three significant shortcoming: (i) Over-reliance on prepackaged moral scenarios with explicitly highlighted moral features; (ii) Focus on verdict prediction rather than moral reasoning; and (iii) Inadequate testing of models' (in)ability to recognize when additional information is needed.
arXiv Detail & Related papers (2025-06-16T03:59:38Z)
Are Language Models Consequentialist or Deontological Moral Reasoners? [75.6788742799773]
We focus on a large-scale analysis of the moral reasoning traces provided by large language models (LLMs)<n>We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology.
arXiv Detail & Related papers (2025-05-27T17:51:18Z)
When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas [68.79830818369683]
Large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents.<n>Recent advances in large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents.<n>There is limited understanding of how they act when moral imperatives directly conflict with rewards or incentives.<n>We introduce Moral Behavior in Social Dilemma Simulation (MoralSim) and evaluate how LLMs behave in the prisoner's dilemma and public goods game with morally charged contexts.
arXiv Detail & Related papers (2025-05-25T16:19:24Z)
M$^3$oralBench: A MultiModal Moral Benchmark for LVLMs [66.78407469042642]
We introduce M$3$oralBench, the first MultiModal Moral Benchmark for LVLMs.<n>M$3$oralBench expands the everyday moral scenarios in Moral Foundations Vignettes (MFVs) and employs the text-to-image diffusion model, SD3.0, to create corresponding scenario images.<n>It conducts moral evaluation across six moral foundations of Moral Foundations Theory (MFT) and encompasses tasks in moral judgement, moral classification, and moral response.
arXiv Detail & Related papers (2024-12-30T05:18:55Z)
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems. Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality. This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.