Learning the Value Systems of Agents with Preference-based and Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2602.04518v1
- Date: Wed, 04 Feb 2026 13:07:15 GMT
- Title: Learning the Value Systems of Agents with Preference-based and Inverse Reinforcement Learning
- Authors: Andrés Holgado-Sánchez, Holger Billhardt, Alberto Fernández, Sascha Ossowski,
- Abstract summary: Agreement Technologies refer to open computer systems in which autonomous software agents interact with one another.<n>We propose a novel method to automatically emphlearn value systems from observations and human demonstrations.
- Score: 1.6970482663318245
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Agreement Technologies refer to open computer systems in which autonomous software agents interact with one another, typically on behalf of humans, in order to come to mutually acceptable agreements. With the advance of AI systems in recent years, it has become apparent that such agreements, in order to be acceptable to the involved parties, must remain aligned with ethical principles and moral values. However, this is notoriously difficult to ensure, especially as different human users (and their software agents) may hold different value systems, i.e. they may differently weigh the importance of individual moral values. Furthermore, it is often hard to specify the precise meaning of a value in a particular context in a computational manner. Methods to estimate value systems based on human-engineered specifications, e.g. based on value surveys, are limited in scale due to the need for intense human moderation. In this article, we propose a novel method to automatically \emph{learn} value systems from observations and human demonstrations. In particular, we propose a formal model of the \emph{value system learning} problem, its instantiation to sequential decision-making domains based on multi-objective Markov decision processes, as well as tailored preference-based and inverse reinforcement learning algorithms to infer value grounding functions and value systems. The approach is illustrated and evaluated by two simulated use cases.
Related papers
- Learning the Value Systems of Societies with Preference-based Multi-objective Reinforcement Learning [4.735670734773144]
Value-aware AI should recognise human values and adapt to the value systems (value-based preferences) of different users.<n>We propose algorithms for learning models of value alignment and value systems for a society of agents.
arXiv Detail & Related papers (2026-02-09T16:06:36Z) - Rethinking How AI Embeds and Adapts to Human Values: Challenges and Opportunities [0.6113558800822273]
We argue that AI systems should implement long-term reasoning and remain adaptable to evolving values.<n>Value alignment requires more theories to address the full spectrum of human values.<n>We identify the challenges associated with value alignment and indicate directions for advancing value alignment research.
arXiv Detail & Related papers (2025-08-23T18:19:05Z) - Learning the Value Systems of Societies from Preferences [1.3836987591220347]
Aligning AI systems with human values and the value-based preferences of various stakeholders is key in ethical AI.<n>In value-aware AI systems, decision-making draws upon explicit computational representations of individual values.<n>We propose a method to address the problem of learning the value systems of societies.
arXiv Detail & Related papers (2025-07-28T11:25:55Z) - Measuring Value Alignment [12.696227679697493]
This paper introduces a novel formalism to quantify the alignment between AI systems and human values.
By utilizing this formalism, AI developers and ethicists can better design and evaluate AI systems to ensure they operate in harmony with human values.
arXiv Detail & Related papers (2023-12-23T12:30:06Z) - Learning Human-like Representations to Enable Learning Human Values [11.236150405125754]
We explore the effects of representational alignment between humans and AI agents on learning human values.
We show that this kind of representational alignment can support safely learning and exploring human values in the context of personalization.
arXiv Detail & Related papers (2023-12-21T18:31:33Z) - Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties [68.66719970507273]
Value pluralism is the view that multiple correct values may be held in tension with one another.
As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts.
We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
arXiv Detail & Related papers (2023-09-02T01:24:59Z) - Training Socially Aligned Language Models on Simulated Social
Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z) - Human Values in Multiagent Systems [3.5027291542274357]
This paper presents a formal representation of values, grounded in the social sciences.
We use this formal representation to articulate the key challenges for achieving value-aligned behaviour in multiagent systems.
arXiv Detail & Related papers (2023-05-04T11:23:59Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - Re-Examining System-Level Correlations of Automatic Summarization
Evaluation Metrics [64.81682222169113]
How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations.
We identify two ways in which the definition of the system-level correlation is inconsistent with how metrics are used to evaluate systems in practice.
arXiv Detail & Related papers (2022-04-21T15:52:14Z) - An Extensible Benchmark Suite for Learning to Simulate Physical Systems [60.249111272844374]
We introduce a set of benchmark problems to take a step towards unified benchmarks and evaluation protocols.
We propose four representative physical systems, as well as a collection of both widely used classical time-based and representative data-driven methods.
arXiv Detail & Related papers (2021-08-09T17:39:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.