Related papers: Reinforcement Learning Under Moral Uncertainty

Reinforcement Learning Under Moral Uncertainty

URL: http://arxiv.org/abs/2006.04734v3
Date: Mon, 19 Jul 2021 18:52:16 GMT
Title: Reinforcement Learning Under Moral Uncertainty
Authors: Adrien Ecoffet and Joel Lehman
Abstract summary: An ambitious goal for machine learning is to create agents that behave ethically. While ethical agents could be trained by rewarding correct behavior under a specific moral theory, there remains widespread disagreement about the nature of morality. This paper proposes two training methods that realize different points among competing desiderata, and trains agents in simple environments to act under moral uncertainty.
Score: 13.761051314923634
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: An ambitious goal for machine learning is to create agents that behave ethically: The capacity to abide by human moral norms would greatly expand the context in which autonomous agents could be practically and safely deployed, e.g. fully autonomous vehicles will encounter charged moral decisions that complicate their deployment. While ethical agents could be trained by rewarding correct behavior under a specific moral theory (e.g. utilitarianism), there remains widespread disagreement about the nature of morality. Acknowledging such disagreement, recent work in moral philosophy proposes that ethical behavior requires acting under moral uncertainty, i.e. to take into account when acting that one's credence is split across several plausible ethical theories. This paper translates such insights to the field of reinforcement learning, proposes two training methods that realize different points among competing desiderata, and trains agents in simple environments to act under moral uncertainty. The results illustrate (1) how such uncertainty can help curb extreme behavior from commitment to single theories and (2) several technical complications arising from attempting to ground moral philosophy in RL (e.g. how can a principled trade-off between two competing but incomparable reward functions be reached). The aim is to catalyze progress towards morally-competent agents and highlight the potential of RL to contribute towards the computational grounding of moral philosophy.

Related papers

Integrating Reason-Based Moral Decision-Making in the Reinforcement Learning Architecture [0.0]
This study explores the development of reason-based artificial moral agents (RBAMAs)<n>RBAMAs are build on an extension of the reinforcement learning architecture to enable moral decision-making based on sound normative reasoning.<n>This study presents a first implementation of an RBAMA and demonstrates the potential of RBAMAs in initial experiments.
arXiv Detail & Related papers (2025-07-20T16:46:16Z)
Moral Responsibility or Obedience: What Do We Want from AI? [0.0]
This paper examines recent safety testing incidents involving large language models (LLMs) that appeared to disobey shutdown commands or engage in ethically ambiguous or illicit behavior.<n>I argue that such behavior should not be interpreted as rogue or misaligned, but as early evidence of emerging ethical reasoning in agentic AI.<n>I call for a shift in AI safety evaluation: away from rigid obedience and toward frameworks that can assess ethical judgment in systems capable of navigating moral dilemmas.
arXiv Detail & Related papers (2025-07-03T16:53:01Z)
Discerning What Matters: A Multi-Dimensional Assessment of Moral Competence in LLMs [0.0]
Moral competence is the ability to act in accordance with moral principles.<n>As large language models (LLMs) are increasingly deployed in situations demanding moral competence, there is increasing interest in evaluating this ability empirically.<n>We identify three significant shortcoming: (i) Over-reliance on prepackaged moral scenarios with explicitly highlighted moral features; (ii) Focus on verdict prediction rather than moral reasoning; and (iii) Inadequate testing of models' (in)ability to recognize when additional information is needed.
arXiv Detail & Related papers (2025-06-16T03:59:38Z)
Are Language Models Consequentialist or Deontological Moral Reasoners? [69.85385952436044]
We focus on a large-scale analysis of the moral reasoning traces provided by large language models (LLMs)<n>We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology.
arXiv Detail & Related papers (2025-05-27T17:51:18Z)
When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas [68.79830818369683]
Large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents.<n>Recent advances in large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents.<n>There is limited understanding of how they act when moral imperatives directly conflict with rewards or incentives.<n>We introduce Moral Behavior in Social Dilemma Simulation (MoralSim) and evaluate how LLMs behave in the prisoner's dilemma and public goods game with morally charged contexts.
arXiv Detail & Related papers (2025-05-25T16:19:24Z)
Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making [0.0]
We present an ethical decision-making framework that refines a pre-trained reinforcement learning (RL) model using a task-agnostic ethical layer. An ethical layer aggregates belief scores from multiple moral perspectives using Belief Jensen-Shannon Divergence and Dempster-Shafer Theory into probability scores that also serve as the shaping reward. This integrated learning framework helps the RL agent navigate moral uncertainty in complex environments and enables it to make morally sound decisions across diverse tasks.
arXiv Detail & Related papers (2025-02-17T19:05:55Z)
Technology as uncharted territory: Contextual integrity and the notion of AI as new ethical ground [55.2480439325792]
I argue that efforts to promote responsible and ethical AI can inadvertently contribute to and seemingly legitimize this disregard for established contextual norms. I question the current narrow prioritization in AI ethics of moral innovation over moral preservation.
arXiv Detail & Related papers (2024-12-06T15:36:13Z)
Learning Machine Morality through Experience and Interaction [3.7414804164475983]
Increasing interest in ensuring safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents. We argue that more hybrid solutions are needed to create adaptable and robust, yet more controllable and interpretable agents.
arXiv Detail & Related papers (2023-12-04T11:46:34Z)
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations [48.686872351114964]
Moral or ethical judgments rely heavily on the specific contexts in which they occur. We introduce defeasible moral reasoning: a task to provide grounded contexts that make an action more or less morally acceptable. We distill a high-quality dataset of 1.2M entries of contextualizations and rationales for 115K defeasible moral actions.
arXiv Detail & Related papers (2023-10-24T00:51:29Z)
If our aim is to build morality into an artificial agent, how might we begin to go about doing so? [0.0]
We discuss the different aspects that should be considered when building moral agents, including the most relevant moral paradigms and challenges. We propose solutions including a hybrid approach to design and a hierarchical approach to combining moral paradigms.
arXiv Detail & Related papers (2023-10-12T12:56:12Z)
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems. Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality. This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z)
From computational ethics to morality: how decision-making algorithms can help us understand the emergence of moral principles, the existence of an optimal behaviour and our ability to discover it [0.0]
This paper adds to the efforts of evolutionary ethics to naturalize morality by providing insights derived from a computational ethics view. We propose a stylized model of human decision-making, which is based on Reinforcement Learning.
arXiv Detail & Related papers (2023-07-20T14:39:08Z)
Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning [4.2050490361120465]
A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents. We present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories. We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation.
arXiv Detail & Related papers (2023-01-20T09:36:42Z)
ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations [81.70195684646681]
We present ClarifyDelphi, an interactive system that learns to ask clarification questions. We posit that questions whose potential answers lead to diverging moral judgments are the most informative. Our work is ultimately inspired by studies in cognitive science that have investigated the flexibility in moral cognition.
arXiv Detail & Related papers (2022-12-20T16:33:09Z)
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions. A central challenge for AI safety is capturing the flexibility of the human moral mind. We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z)
Metaethical Perspectives on 'Benchmarking' AI Ethics [81.65697003067841]
Benchmarks are seen as the cornerstone for measuring technical progress in Artificial Intelligence (AI) research. An increasingly prominent research area in AI is ethics, which currently has no set of benchmarks nor commonly accepted way for measuring the 'ethicality' of an AI system. We argue that it makes more sense to talk about 'values' rather than 'ethics' when considering the possible actions of present and future AI systems.
arXiv Detail & Related papers (2022-04-11T14:36:39Z)
On the Morality of Artificial Intelligence [154.69452301122175]
We propose conceptual and practical principles and guidelines for Machine Learning research and deployment. We insist on concrete actions that can be taken by practitioners to pursue a more ethical and moral practice of ML aimed at using AI for social good.
arXiv Detail & Related papers (2019-12-26T23:06:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.