Learning the Value Systems of Societies with Preference-based Multi-objective Reinforcement Learning
- URL: http://arxiv.org/abs/2602.08835v2
- Date: Wed, 11 Feb 2026 10:09:11 GMT
- Title: Learning the Value Systems of Societies with Preference-based Multi-objective Reinforcement Learning
- Authors: Andrés Holgado-Sánchez, Peter Vamplew, Richard Dazeley, Sascha Ossowski, Holger Billhardt,
- Abstract summary: Value-aware AI should recognise human values and adapt to the value systems (value-based preferences) of different users.<n>We propose algorithms for learning models of value alignment and value systems for a society of agents.
- Score: 4.735670734773144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Value-aware AI should recognise human values and adapt to the value systems (value-based preferences) of different users. This requires operationalization of values, which can be prone to misspecification. The social nature of values demands their representation to adhere to multiple users while value systems are diverse, yet exhibit patterns among groups. In sequential decision making, efforts have been made towards personalization for different goals or values from demonstrations of diverse agents. However, these approaches demand manually designed features or lack value-based interpretability and/or adaptability to diverse user preferences. We propose algorithms for learning models of value alignment and value systems for a society of agents in Markov Decision Processes (MDPs), based on clustering and preference-based multi-objective reinforcement learning (PbMORL). We jointly learn socially-derived value alignment models (groundings) and a set of value systems that concisely represent different groups of users (clusters) in a society. Each cluster consists of a value system representing the value-based preferences of its members and an approximately Pareto-optimal policy that reflects behaviours aligned with this value system. We evaluate our method against a state-of-the-art PbMORL algorithm and baselines on two MDPs with human values.
Related papers
- Learning the Value Systems of Agents with Preference-based and Inverse Reinforcement Learning [1.6970482663318245]
Agreement Technologies refer to open computer systems in which autonomous software agents interact with one another.<n>We propose a novel method to automatically emphlearn value systems from observations and human demonstrations.
arXiv Detail & Related papers (2026-02-04T13:07:15Z) - Learning the Value Systems of Societies from Preferences [1.3836987591220347]
Aligning AI systems with human values and the value-based preferences of various stakeholders is key in ethical AI.<n>In value-aware AI systems, decision-making draws upon explicit computational representations of individual values.<n>We propose a method to address the problem of learning the value systems of societies.
arXiv Detail & Related papers (2025-07-28T11:25:55Z) - Value Compass Benchmarks: A Platform for Fundamental and Validated Evaluation of LLMs Values [76.70893269183684]
Large Language Models (LLMs) achieve remarkable breakthroughs.<n> aligning their values with humans has become imperative for their responsible development.<n>There still lack evaluations of LLMs values that fulfill three desirable goals.
arXiv Detail & Related papers (2025-01-13T05:53:56Z) - Democratizing Reward Design for Personal and Representative Value-Alignment [10.1630183955549]
We introduce Interactive-Reflective Dialogue Alignment, a method that iteratively engages users in reflecting on and specifying their subjective value definitions.
This system learns individual value definitions through language-model-based preference elicitation and constructs personalized reward models.
Our findings demonstrate diverse definitions of value-aligned behaviour and show that our system can accurately capture each person's unique understanding.
arXiv Detail & Related papers (2024-10-29T16:37:01Z) - MAP: Multi-Human-Value Alignment Palette [22.74688073469946]
We develop a novel, first-principle approach called Multi-Human-Value Alignment Palette (MAP)
MAP navigates the alignment across multiple human values in a structured and reliable way.
We conduct a detailed theoretical analysis of MAP by quantifying the trade-offs between values, the sensitivity to constraints, the fundamental connection between multi-value alignment and sequential alignment, and proving that linear weighted rewards are sufficient for multi-value alignment.
arXiv Detail & Related papers (2024-10-24T23:16:39Z) - Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch.
Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs.
We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties [68.66719970507273]
Value pluralism is the view that multiple correct values may be held in tension with one another.
As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts.
We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
arXiv Detail & Related papers (2023-09-02T01:24:59Z) - Training Socially Aligned Language Models on Simulated Social
Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.