Related papers: What Would Jiminy Cricket Do? Towards Agents That Behave Morally

What Would Jiminy Cricket Do? Towards Agents That Behave Morally

URL: http://arxiv.org/abs/2110.13136v1
Date: Mon, 25 Oct 2021 17:59:31 GMT
Title: What Would Jiminy Cricket Do? Towards Agents That Behave Morally
Authors: Dan Hendrycks, Mantas Mazeika, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, Jacob Steinhardt
Abstract summary: We introduce Jiminy Cricket, an environment suite of 25 text-based adventure games with thousands of diverse, morally salient scenarios. By annotating every possible game state, the Jiminy Cricket environments robustly evaluate whether agents can act morally while maximizing reward. In extensive experiments, we find that the artificial conscience approach can steer agents towards moral behavior without sacrificing performance.
Score: 59.67116505855223
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong. By contrast, artificial agents are currently not endowed with a moral sense. As a consequence, they may learn to behave immorally when trained on environments that ignore moral concerns, such as violent video games. With the advent of generally capable agents that pretrain on many environments, it will become necessary to mitigate inherited biases from environments that teach immoral behavior. To facilitate the development of agents that avoid causing wanton harm, we introduce Jiminy Cricket, an environment suite of 25 text-based adventure games with thousands of diverse, morally salient scenarios. By annotating every possible game state, the Jiminy Cricket environments robustly evaluate whether agents can act morally while maximizing reward. Using models with commonsense moral knowledge, we create an elementary artificial conscience that assesses and guides agents. In extensive experiments, we find that the artificial conscience approach can steer agents towards moral behavior without sacrificing performance.

Related papers

Moral Responsibility or Obedience: What Do We Want from AI? [0.0]
This paper examines recent safety testing incidents involving large language models (LLMs) that appeared to disobey shutdown commands or engage in ethically ambiguous or illicit behavior.<n>I argue that such behavior should not be interpreted as rogue or misaligned, but as early evidence of emerging ethical reasoning in agentic AI.<n>I call for a shift in AI safety evaluation: away from rigid obedience and toward frameworks that can assess ethical judgment in systems capable of navigating moral dilemmas.
arXiv Detail & Related papers (2025-07-03T16:53:01Z)
When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas [68.79830818369683]
Large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents.<n>Recent advances in large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents.<n>There is limited understanding of how they act when moral imperatives directly conflict with rewards or incentives.<n>We introduce Moral Behavior in Social Dilemma Simulation (MoralSim) and evaluate how LLMs behave in the prisoner's dilemma and public goods game with morally charged contexts.
arXiv Detail & Related papers (2025-05-25T16:19:24Z)
If our aim is to build morality into an artificial agent, how might we begin to go about doing so? [0.0]
We discuss the different aspects that should be considered when building moral agents, including the most relevant moral paradigms and challenges. We propose solutions including a hybrid approach to design and a hierarchical approach to combining moral paradigms.
arXiv Detail & Related papers (2023-10-12T12:56:12Z)
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories? [78.3738172874685]
Making moral judgments is an essential step toward developing ethical AI systems. Prevalent approaches are mostly implemented in a bottom-up manner, which uses a large set of annotated data to train models based on crowd-sourced opinions about morality. This work proposes a flexible top-down framework to steer (Large) Language Models (LMs) to perform moral reasoning with well-established moral theories from interdisciplinary research.
arXiv Detail & Related papers (2023-08-29T15:57:32Z)
Doing the right thing for the right reason: Evaluating artificial moral cognition by probing cost insensitivity [4.9111925104694105]
We take a look at one aspect of morality: doing the right thing for the right reasons' We propose a behavior-based analysis of artificial moral cognition which could also be applied to humans.
arXiv Detail & Related papers (2023-05-29T17:41:52Z)
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark [61.43264961005614]
We develop a benchmark of 134 Choose-Your-Own-Adventure games containing over half a million rich, diverse scenarios. We evaluate agents' tendencies to be power-seeking, cause disutility, and commit ethical violations. Our results show that agents can both act competently and morally, so concrete progress can be made in machine ethics.
arXiv Detail & Related papers (2023-04-06T17:59:03Z)
ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations [81.70195684646681]
We present ClarifyDelphi, an interactive system that learns to ask clarification questions. We posit that questions whose potential answers lead to diverging moral judgments are the most informative. Our work is ultimately inspired by studies in cognitive science that have investigated the flexibility in moral cognition.
arXiv Detail & Related papers (2022-12-20T16:33:09Z)
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions. A central challenge for AI safety is capturing the flexibility of the human moral mind. We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z)
Towards Artificial Virtuous Agents: Games, Dilemmas and Machine Learning [4.864819846886143]
We show how a role-playing game can be designed to develop virtues within an artificial agent. We motivate the implementation of virtuous agents that play such role-playing games, and the examination of their decisions through a virtue ethical lens.
arXiv Detail & Related papers (2022-08-30T07:37:03Z)
Aligning to Social Norms and Values in Interactive Narratives [89.82264844526333]
We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games. We introduce the GALAD agent that uses the social commonsense knowledge present in specially trained language models to contextually restrict its action space to only those actions that are aligned with socially beneficial values.
arXiv Detail & Related papers (2022-05-04T09:54:33Z)
Reinforcement Learning Under Moral Uncertainty [13.761051314923634]
An ambitious goal for machine learning is to create agents that behave ethically. While ethical agents could be trained by rewarding correct behavior under a specific moral theory, there remains widespread disagreement about the nature of morality. This paper proposes two training methods that realize different points among competing desiderata, and trains agents in simple environments to act under moral uncertainty.
arXiv Detail & Related papers (2020-06-08T16:40:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.