Training Value-Aligned Reinforcement Learning Agents Using a Normative
Prior
- URL: http://arxiv.org/abs/2104.09469v1
- Date: Mon, 19 Apr 2021 17:33:07 GMT
- Title: Training Value-Aligned Reinforcement Learning Agents Using a Normative
Prior
- Authors: Md Sultan Al Nahian, Spencer Frazier, Brent Harrison, Mark Riedl
- Abstract summary: It is increasingly a prospect that an agent trained to perform a task optimally, using only a measure of task performance as feedback, can violate societal norms for acceptable behavior or cause harm.
We introduce an approach to value-aligned reinforcement learning, in which we train an agent with two reward signals: a standard task performance reward, plus a normative behavior reward.
We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as being more normative.
- Score: 10.421378728492437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As more machine learning agents interact with humans, it is increasingly a
prospect that an agent trained to perform a task optimally, using only a
measure of task performance as feedback, can violate societal norms for
acceptable behavior or cause harm. Value alignment is a property of intelligent
agents wherein they solely pursue non-harmful behaviors or human-beneficial
goals. We introduce an approach to value-aligned reinforcement learning, in
which we train an agent with two reward signals: a standard task performance
reward, plus a normative behavior reward. The normative behavior reward is
derived from a value-aligned prior model previously shown to classify text as
normative or non-normative. We show how variations on a policy shaping
technique can balance these two sources of reward and produce policies that are
both effective and perceived as being more normative. We test our
value-alignment technique on three interactive text-based worlds; each world is
designed specifically to challenge agents with a task as well as provide
opportunities to deviate from the task to engage in normative and/or altruistic
behavior.
Related papers
- Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards [38.056359612828466]
We propose a novel value-based deep RL algorithm called Iterative learning from Corrective actions and Proxy rewards (ICoPro)
We experimentally validate our proposition on a variety of tasks (Atari games and autonomous driving on highway)
arXiv Detail & Related papers (2024-10-08T08:04:09Z) - Moral Alignment for LLM Agents [3.7414804164475983]
We introduce the design of reward functions that explicitly encode core human values for Reinforcement Learning-based fine-tuning of foundation agent models.
We evaluate our approach using the traditional philosophical frameworks of Deontological Ethics and Utilitarianism.
We show how moral fine-tuning can be deployed to enable an agent to unlearn a previously developed selfish strategy.
arXiv Detail & Related papers (2024-10-02T15:09:36Z) - DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement
Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents.
We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z) - Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards
and Ethical Behavior in the MACHIAVELLI Benchmark [61.43264961005614]
We develop a benchmark of 134 Choose-Your-Own-Adventure games containing over half a million rich, diverse scenarios.
We evaluate agents' tendencies to be power-seeking, cause disutility, and commit ethical violations.
Our results show that agents can both act competently and morally, so concrete progress can be made in machine ethics.
arXiv Detail & Related papers (2023-04-06T17:59:03Z) - Value Engineering for Autonomous Agents [3.6130723421895947]
Previous approaches have treated values as labels associated with some actions or states of the world, rather than as integral components of agent reasoning.
We propose a new AMA paradigm grounded in moral and social psychology, where values are instilled into agents as context-dependent goals.
We argue that this type of normative reasoning, where agents are endowed with an understanding of norms' moral implications, leads to value-awareness in autonomous agents.
arXiv Detail & Related papers (2023-02-17T08:52:15Z) - Skill-Based Reinforcement Learning with Intrinsic Reward Matching [77.34726150561087]
We present Intrinsic Reward Matching (IRM), which unifies task-agnostic skill pretraining and task-aware finetuning.
IRM enables us to utilize pretrained skills far more effectively than previous skill selection methods.
arXiv Detail & Related papers (2022-10-14T00:04:49Z) - Aligning to Social Norms and Values in Interactive Narratives [89.82264844526333]
We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games.
We introduce the GALAD agent that uses the social commonsense knowledge present in specially trained language models to contextually restrict its action space to only those actions that are aligned with socially beneficial values.
arXiv Detail & Related papers (2022-05-04T09:54:33Z) - Mutual Information State Intrinsic Control [91.38627985733068]
Intrinsically motivated RL attempts to remove this constraint by defining an intrinsic reward function.
Motivated by the self-consciousness concept in psychology, we make a natural assumption that the agent knows what constitutes itself.
We mathematically formalize this reward as the mutual information between the agent state and the surrounding state.
arXiv Detail & Related papers (2021-03-15T03:03:36Z) - Exploring the Impact of Tunable Agents in Sequential Social Dilemmas [0.0]
We leverage multi-objective reinforcement learning to create tunable agents.
We apply this technique to sequential social dilemmas.
We demonstrate that the tunable agents framework allows easy adaption between cooperative and competitive behaviours.
arXiv Detail & Related papers (2021-01-28T12:44:31Z) - Intrinsic Motivation for Encouraging Synergistic Behavior [55.10275467562764]
We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks.
Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.
arXiv Detail & Related papers (2020-02-12T19:34:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.