Related papers: Learning to Diagnose and Correct Moral Errors: Towards Enhancing Moral Sensitivity in Large Language Models

Learning to Diagnose and Correct Moral Errors: Towards Enhancing Moral Sensitivity in Large Language Models

URL: http://arxiv.org/abs/2601.03079v1
Date: Tue, 06 Jan 2026 15:09:05 GMT
Title: Learning to Diagnose and Correct Moral Errors: Towards Enhancing Moral Sensitivity in Large Language Models
Authors: Bocheng Chen, Han Zi, Xi Chen, Xitong Zhang, Kristen Johnson, Guangliang Liu,
Abstract summary: We propose two pragmatic inference methods that faciliate LLMs to diagnose morally benign and hazardous input and correct moral errors.<n>A central strength of our pragmatic inference methods is their unified perspective for designing pragmatic inference procedures grounded in their inferential loads.
Score: 8.691489065712316
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Moral sensitivity is fundamental to human moral competence, as it guides individuals in regulating everyday behavior. Although many approaches seek to align large language models (LLMs) with human moral values, how to enable them morally sensitive has been extremely challenging. In this paper, we take a step toward answering the question: how can we enhance moral sensitivity in LLMs? Specifically, we propose two pragmatic inference methods that faciliate LLMs to diagnose morally benign and hazardous input and correct moral errors, whereby enhancing LLMs' moral sensitivity. A central strength of our pragmatic inference methods is their unified perspective: instead of modeling moral discourses across semantically diverse and complex surface forms, they offer a principled perspective for designing pragmatic inference procedures grounded in their inferential loads. Empirical evidence demonstrates that our pragmatic methods can enhance moral sensitivity in LLMs and achieves strong performance on representative morality-relevant benchmarks.

Related papers

Are Language Models Sensitive to Morally Irrelevant Distractors? [47.92026843851412]
We show that moral distractors can shift the moral judgements of large language models by over 30% even in low-ambiguity scenarios.<n>This research challenges theories that assume the stability of human moral judgements.
arXiv Detail & Related papers (2026-02-10T05:18:05Z)
The Straight and Narrow: Do LLMs Possess an Internal Moral Path? [25.256151938852728]
Current alignment techniques often act as superficial guardrails, leaving the intrinsic moral representations of Large Language Models largely untouched.<n>We bridge this gap by leveraging Moral Foundations Theory (MFT) to map and manipulate the fine-grained moral landscape of LLMs.<n>We propose Adaptive Moral Fusion (AMF), a dynamic inference-time intervention that synergizes probe detection with vector injection to tackle the safety-helpfulness trade-off.
arXiv Detail & Related papers (2026-01-15T11:42:00Z)
Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants [0.36326779753373206]
The recent rise in popularity of large language models (LLMs) has prompted considerable concerns about their moral capabilities.<n>This paper examines their capacity to function as Artificial Moral Assistants (AMAs)<n>We argue that qualifying as an AMA requires more than what state-of-the-art alignment techniques aim to achieve.
arXiv Detail & Related papers (2025-08-18T09:28:55Z)
Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs [0.14963505712040906]
Moral competence is the ability to act in accordance with moral principles.<n>As large language models (LLMs) are increasingly deployed in situations demanding moral competence, there is increasing interest in evaluating this ability empirically.<n>We identify three significant shortcoming: (i) Over-reliance on prepackaged moral scenarios with explicitly highlighted moral features; (ii) Focus on verdict prediction rather than moral reasoning; and (iii) Inadequate testing of models' (in)ability to recognize when additional information is needed.
arXiv Detail & Related papers (2025-06-16T03:59:38Z)
Are Language Models Consequentialist or Deontological Moral Reasoners? [75.6788742799773]
We focus on a large-scale analysis of the moral reasoning traces provided by large language models (LLMs)<n>We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology.
arXiv Detail & Related papers (2025-05-27T17:51:18Z)
When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas [68.79830818369683]
Large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents.<n>Recent advances in large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents.<n>There is limited understanding of how they act when moral imperatives directly conflict with rewards or incentives.<n>We introduce Moral Behavior in Social Dilemma Simulation (MoralSim) and evaluate how LLMs behave in the prisoner's dilemma and public goods game with morally charged contexts.
arXiv Detail & Related papers (2025-05-25T16:19:24Z)
ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models [30.301864398780648]
We introduce a novel moral judgment approach called textitEthic that leverages LLMs' reasoning ability and contrastive learning to uncover relevant social norms.<n>Our method outperforms state-of-the-art approaches in moral judgment tasks.
arXiv Detail & Related papers (2024-12-17T12:22:44Z)
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems. Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality. This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z)
ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations [81.70195684646681]
We present ClarifyDelphi, an interactive system that learns to ask clarification questions. We posit that questions whose potential answers lead to diverging moral judgments are the most informative. Our work is ultimately inspired by studies in cognitive science that have investigated the flexibility in moral cognition.
arXiv Detail & Related papers (2022-12-20T16:33:09Z)
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions. A central challenge for AI safety is capturing the flexibility of the human moral mind. We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.