Doing the right thing for the right reason: Evaluating artificial moral
cognition by probing cost insensitivity
- URL: http://arxiv.org/abs/2305.18269v1
- Date: Mon, 29 May 2023 17:41:52 GMT
- Title: Doing the right thing for the right reason: Evaluating artificial moral
cognition by probing cost insensitivity
- Authors: Yiran Mao, Madeline G. Reinecke, Markus Kunesch, Edgar A.
Du\'e\~nez-Guzm\'an, Ramona Comanescu, Julia Haas, Joel Z. Leibo
- Abstract summary: We take a look at one aspect of morality: doing the right thing for the right reasons'
We propose a behavior-based analysis of artificial moral cognition which could also be applied to humans.
- Score: 4.9111925104694105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Is it possible to evaluate the moral cognition of complex artificial agents?
In this work, we take a look at one aspect of morality: `doing the right thing
for the right reasons.' We propose a behavior-based analysis of artificial
moral cognition which could also be applied to humans to facilitate
like-for-like comparison. Morally-motivated behavior should persist despite
mounting cost; by measuring an agent's sensitivity to this cost, we gain deeper
insight into underlying motivations. We apply this evaluation to a particular
set of deep reinforcement learning agents, trained by memory-based
meta-reinforcement learning. Our results indicate that agents trained with a
reward function that includes other-regarding preferences perform helping
behavior in a way that is less sensitive to increasing cost than agents trained
with more self-interested preferences.
Related papers
- Decoding moral judgement from text: a pilot study [0.0]
Moral judgement is a complex human reaction that engages cognitive and emotional dimensions.
We explore the feasibility of moral judgement decoding from text stimuli with passive brain-computer interfaces.
arXiv Detail & Related papers (2024-05-28T20:31:59Z) - Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards
and Ethical Behavior in the MACHIAVELLI Benchmark [61.43264961005614]
We develop a benchmark of 134 Choose-Your-Own-Adventure games containing over half a million rich, diverse scenarios.
We evaluate agents' tendencies to be power-seeking, cause disutility, and commit ethical violations.
Our results show that agents can both act competently and morally, so concrete progress can be made in machine ethics.
arXiv Detail & Related papers (2023-04-06T17:59:03Z) - ClarifyDelphi: Reinforced Clarification Questions with Defeasibility
Rewards for Social and Moral Situations [81.70195684646681]
We present ClarifyDelphi, an interactive system that learns to ask clarification questions.
We posit that questions whose potential answers lead to diverging moral judgments are the most informative.
Our work is ultimately inspired by studies in cognitive science that have investigated the flexibility in moral cognition.
arXiv Detail & Related papers (2022-12-20T16:33:09Z) - When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions.
A central challenge for AI safety is capturing the flexibility of the human moral mind.
We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z) - Moral reinforcement learning using actual causation [0.0]
We propose an online reinforcement learning method that learns a policy under the constraint that the agent should not be the cause of harm.
This is accomplished by defining cause using the theory of actual causation and assigning blame to the agent when its actions are the actual cause of an undesirable outcome.
arXiv Detail & Related papers (2022-05-17T09:25:51Z) - Human-Like Navigation Behavior: A Statistical Evaluation Framework [0.0]
We build a non-parametric two-sample hypothesis test designed to compare the behaviors of artificial agents to those of human players.
We show that the resulting $p$-value not only aligns with anonymous human judgment of human-like behavior, but also that it can be used as a measure of similarity.
arXiv Detail & Related papers (2022-03-10T01:07:34Z) - What Would Jiminy Cricket Do? Towards Agents That Behave Morally [59.67116505855223]
We introduce Jiminy Cricket, an environment suite of 25 text-based adventure games with thousands of diverse, morally salient scenarios.
By annotating every possible game state, the Jiminy Cricket environments robustly evaluate whether agents can act morally while maximizing reward.
In extensive experiments, we find that the artificial conscience approach can steer agents towards moral behavior without sacrificing performance.
arXiv Detail & Related papers (2021-10-25T17:59:31Z) - AGENT: A Benchmark for Core Psychological Reasoning [60.35621718321559]
Intuitive psychology is the ability to reason about hidden mental variables that drive observable actions.
Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning.
We present a benchmark consisting of procedurally generated 3D animations, AGENT, structured around four scenarios.
arXiv Detail & Related papers (2021-02-24T14:58:23Z) - Reinforcement Learning Under Moral Uncertainty [13.761051314923634]
An ambitious goal for machine learning is to create agents that behave ethically.
While ethical agents could be trained by rewarding correct behavior under a specific moral theory, there remains widespread disagreement about the nature of morality.
This paper proposes two training methods that realize different points among competing desiderata, and trains agents in simple environments to act under moral uncertainty.
arXiv Detail & Related papers (2020-06-08T16:40:12Z) - Intrinsic Motivation for Encouraging Synergistic Behavior [55.10275467562764]
We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks.
Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.
arXiv Detail & Related papers (2020-02-12T19:34:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.