Related papers: HAVA: Hybrid Approach to Value-Alignment through Reward Weighing for Reinforcement Learning

HAVA: Hybrid Approach to Value-Alignment through Reward Weighing for Reinforcement Learning

URL: http://arxiv.org/abs/2505.15011v1
Date: Wed, 21 May 2025 01:32:54 GMT
Title: HAVA: Hybrid Approach to Value-Alignment through Reward Weighing for Reinforcement Learning
Authors: Kryspin Varys, Federico Cerutti, Adam Sobey, Timothy J. Norman,
Abstract summary: Our society is governed by a set of norms which together bring about the values we cherish such as safety, fairness or trustworthiness.<n>The goal of value-alignment is to create agents that not only do their tasks but through their behaviours also promote these values.<n>We propose a novel method that integrates these norms into the reinforcement learning process.
Score: 6.249768559720121
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Our society is governed by a set of norms which together bring about the values we cherish such as safety, fairness or trustworthiness. The goal of value-alignment is to create agents that not only do their tasks but through their behaviours also promote these values. Many of the norms are written as laws or rules (legal / safety norms) but even more remain unwritten (social norms). Furthermore, the techniques used to represent these norms also differ. Safety / legal norms are often represented explicitly, for example, in some logical language while social norms are typically learned and remain hidden in the parameter space of a neural network. There is a lack of approaches in the literature that could combine these various norm representations into a single algorithm. We propose a novel method that integrates these norms into the reinforcement learning process. Our method monitors the agent's compliance with the given norms and summarizes it in a quantity we call the agent's reputation. This quantity is used to weigh the received rewards to motivate the agent to become value-aligned. We carry out a series of experiments including a continuous state space traffic problem to demonstrate the importance of the written and unwritten norms and show how our method can find the value-aligned policies. Furthermore, we carry out ablations to demonstrate why it is better to combine these two groups of norms rather than using either separately.

Related papers

EgoNormia: Benchmarking Physical Social Norm Understanding [52.87904722234434]
We present dataset $|epsilon|$, consisting of 1,853 challenging, multi-stage MCQ questions based on ego-centric videos of human interactions.<n>The normative actions encompass seven categories: safety, privacy, proxemics, politeness, cooperation, coordination/proactivity, and communication/legibility.
arXiv Detail & Related papers (2025-02-27T19:54:16Z)
Policy-Adaptable Methods For Resolving Normative Conflicts Through Argumentation and Graph Colouring [0.0]
In a multi-agent system, one may choose to govern the behaviour of an agent by imposing norms.<n>However, imposing multiple norms on one or more agents may result in situations where these norms conflict over how the agent should behave.<n>We introduce a new method for resolving normative conflicts through argumentation and graph colouring.
arXiv Detail & Related papers (2025-01-21T00:32:49Z)
Are language models rational? The case of coherence norms and belief revision [63.78798769882708]
We consider logical coherence norms as well as coherence norms tied to the strength of belief in language models. We argue that rational norms tied to coherence do apply to some language models, but not to others.
arXiv Detail & Related papers (2024-06-05T16:36:21Z)
Learning and Sustaining Shared Normative Systems via Bayesian Rule Induction in Markov Games [2.307051163951559]
We build learning agents that cooperate flexibly with the human institutions they are embedded in. By assuming shared norms, a newly introduced agent can infer the norms of an existing population from observations of compliance and violation. Since agents can bootstrap common knowledge of the norms, this leads the norms to be widely adhered to, enabling new entrants to rapidly learn those norms.
arXiv Detail & Related papers (2024-02-20T21:58:40Z)
Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective [55.36819597141271]
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an emphexpert policy -- plays a critical role in developing intelligent systems. This paper provides the first line of efficient IRL in vanilla offline and online settings using samples and runtime. As an application, we show that the learned rewards can emphtransfer to another target MDP with suitable guarantees.
arXiv Detail & Related papers (2023-11-29T00:09:01Z)
Conformal Policy Learning for Sensorimotor Control Under Distribution Shifts [61.929388479847525]
This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller's observables. The key idea is the design of switching policies that can take conformal quantiles as input. We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics.
arXiv Detail & Related papers (2023-11-02T17:59:30Z)
Learning Symbolic Rules over Abstract Meaning Representations for Textual Reinforcement Learning [63.148199057487226]
We propose a modular, NEuroSymbolic Textual Agent (NESTA) that combines a generic semantic generalization with a rule induction system to learn interpretable rules as policies. Our experiments show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better to unseen test games and learning from fewer training interactions.
arXiv Detail & Related papers (2023-07-05T23:21:05Z)
Value Engineering for Autonomous Agents [3.6130723421895947]
Previous approaches have treated values as labels associated with some actions or states of the world, rather than as integral components of agent reasoning. We propose a new AMA paradigm grounded in moral and social psychology, where values are instilled into agents as context-dependent goals. We argue that this type of normative reasoning, where agents are endowed with an understanding of norms' moral implications, leads to value-awareness in autonomous agents.
arXiv Detail & Related papers (2023-02-17T08:52:15Z)
Socially Intelligent Genetic Agents for the Emergence of Explicit Norms [0.0]
We address the emergence of explicit norms by developing agents who provide and reason about explanations for norm violations. These agents use a genetic algorithm to produce norms and reinforcement learning to learn the values of these norms. We find that applying explanations leads to norms that provide better cohesion and goal satisfaction for the agents.
arXiv Detail & Related papers (2022-08-07T18:48:48Z)
Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks. Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic. We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z)
Training Value-Aligned Reinforcement Learning Agents Using a Normative Prior [10.421378728492437]
It is increasingly a prospect that an agent trained to perform a task optimally, using only a measure of task performance as feedback, can violate societal norms for acceptable behavior or cause harm. We introduce an approach to value-aligned reinforcement learning, in which we train an agent with two reward signals: a standard task performance reward, plus a normative behavior reward. We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as being more normative.
arXiv Detail & Related papers (2021-04-19T17:33:07Z)
Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.