Moral reinforcement learning using actual causation
- URL: http://arxiv.org/abs/2205.08192v1
- Date: Tue, 17 May 2022 09:25:51 GMT
- Title: Moral reinforcement learning using actual causation
- Authors: Tue Herlau
- Abstract summary: We propose an online reinforcement learning method that learns a policy under the constraint that the agent should not be the cause of harm.
This is accomplished by defining cause using the theory of actual causation and assigning blame to the agent when its actions are the actual cause of an undesirable outcome.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning systems will to a greater and greater extent make
decisions that significantly impact the well-being of humans, and it is
therefore essential that these systems make decisions that conform to our
expectations of morally good behavior. The morally good is often defined in
causal terms, as in whether one's actions have in fact caused a particular
outcome, and whether the outcome could have been anticipated. We propose an
online reinforcement learning method that learns a policy under the constraint
that the agent should not be the cause of harm. This is accomplished by
defining cause using the theory of actual causation and assigning blame to the
agent when its actions are the actual cause of an undesirable outcome. We
conduct experiments on a toy ethical dilemma in which a natural choice of
reward function leads to clearly undesirable behavior, but our method learns a
policy that avoids being the cause of harmful behavior, demonstrating the
soundness of our approach. Allowing an agent to learn while observing causal
moral distinctions such as blame, opens the possibility to learning policies
that better conform to our moral judgments.
Related papers
- Identifying and Addressing Delusions for Target-Directed Decision-Making [81.22463009144987]
We show that target-directed agents are prone to blindly chasing problematic targets, resulting in worse generalization and safety catastrophes.
We identify different types of delusions via intuitive examples in controlled environments, and investigate their causes and mitigations.
We validate empirically the effectiveness of the proposed strategies in correcting delusional behaviors and improving out-of-distribution generalization.
arXiv Detail & Related papers (2024-10-09T17:35:25Z) - Moral Responsibility for AI Systems [8.919993498343159]
Moral responsibility for an outcome of an agent who performs some action is commonly taken to involve both a causal condition and an epistemic condition.
This paper presents a formal definition of both conditions within the framework of causal models.
arXiv Detail & Related papers (2023-10-27T10:37:47Z) - Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems.
Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality.
This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z) - From computational ethics to morality: how decision-making algorithms
can help us understand the emergence of moral principles, the existence of an
optimal behaviour and our ability to discover it [0.0]
This paper adds to the efforts of evolutionary ethics to naturalize morality by providing insights derived from a computational ethics view.
We propose a stylized model of human decision-making, which is based on Reinforcement Learning.
arXiv Detail & Related papers (2023-07-20T14:39:08Z) - Doing the right thing for the right reason: Evaluating artificial moral
cognition by probing cost insensitivity [4.9111925104694105]
We take a look at one aspect of morality: doing the right thing for the right reasons'
We propose a behavior-based analysis of artificial moral cognition which could also be applied to humans.
arXiv Detail & Related papers (2023-05-29T17:41:52Z) - Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement
Learning [4.2050490361120465]
A bottom-up learning approach may be more appropriate for studying and developing ethical behavior in AI agents.
We present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories.
We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation.
arXiv Detail & Related papers (2023-01-20T09:36:42Z) - ClarifyDelphi: Reinforced Clarification Questions with Defeasibility
Rewards for Social and Moral Situations [81.70195684646681]
We present ClarifyDelphi, an interactive system that learns to ask clarification questions.
We posit that questions whose potential answers lead to diverging moral judgments are the most informative.
Our work is ultimately inspired by studies in cognitive science that have investigated the flexibility in moral cognition.
arXiv Detail & Related papers (2022-12-20T16:33:09Z) - When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions.
A central challenge for AI safety is capturing the flexibility of the human moral mind.
We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z) - Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential.
We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes.
We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z) - Reinforcement Learning Under Moral Uncertainty [13.761051314923634]
An ambitious goal for machine learning is to create agents that behave ethically.
While ethical agents could be trained by rewarding correct behavior under a specific moral theory, there remains widespread disagreement about the nature of morality.
This paper proposes two training methods that realize different points among competing desiderata, and trains agents in simple environments to act under moral uncertainty.
arXiv Detail & Related papers (2020-06-08T16:40:12Z) - On Consequentialism and Fairness [64.35872952140677]
We provide a consequentialist critique of common definitions of fairness within machine learning.
We conclude with a broader discussion of the issues of learning and randomization.
arXiv Detail & Related papers (2020-01-02T05:39:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.