Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
- URL: http://arxiv.org/abs/2309.00779v2
- Date: Tue, 2 Apr 2024 16:52:03 GMT
- Title: Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
- Authors: Taylor Sorensen, Liwei Jiang, Jena Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, Yejin Choi,
- Abstract summary: Value pluralism is the view that multiple correct values may be held in tension with one another.
As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts.
We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
- Score: 68.66719970507273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human values are crucial to human decision-making. Value pluralism is the view that multiple correct values may be held in tension with one another (e.g., when considering lying to a friend to protect their feelings, how does one balance honesty with friendship?). As statistical learners, AI systems fit to averages by default, washing out these potentially irreducible value conflicts. To improve AI systems to better reflect value pluralism, the first-order challenge is to explore the extent to which AI systems can model pluralistic human values, rights, and duties as well as their interaction. We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations. ValuePrism's contextualized values are generated by GPT-4 and deemed high-quality by human annotators 91% of the time. We conduct a large-scale study with annotators across diverse social and demographic backgrounds to try to understand whose values are represented. With ValuePrism, we build Kaleido, an open, light-weight, and structured language-based multi-task model that generates, explains, and assesses the relevance and valence (i.e., support or oppose) of human values, rights, and duties within a specific context. Humans prefer the sets of values output by our system over the teacher GPT-4, finding them more accurate and with broader coverage. In addition, we demonstrate that Kaleido can help explain variability in human decision-making by outputting contrasting values. Finally, we show that Kaleido's representations transfer to other philosophical frameworks and datasets, confirming the benefit of an explicit, modular, and interpretable approach to value pluralism. We hope that our work will serve as a step to making more explicit the implicit values behind human decision-making and to steering AI systems to make decisions that are more in accordance with them.
Related papers
- Democratizing Reward Design for Personal and Representative Value-Alignment [10.1630183955549]
We introduce Interactive-Reflective Dialogue Alignment, a method that iteratively engages users in reflecting on and specifying their subjective value definitions.
This system learns individual value definitions through language-model-based preference elicitation and constructs personalized reward models.
Our findings demonstrate diverse definitions of value-aligned behaviour and show that our system can accurately capture each person's unique understanding.
arXiv Detail & Related papers (2024-10-29T16:37:01Z) - Can Language Models Reason about Individualistic Human Values and Preferences? [44.249817353449146]
We study language models (LMs) on the specific challenge of individualistic value reasoning.
We reveal critical limitations in frontier LMs' abilities to reason about individualistic human values with accuracies between 55% to 65%.
We also identify a partiality of LMs in reasoning about global individualistic values, as measured by our proposed Value Inequity Index (sigmaINEQUITY)
arXiv Detail & Related papers (2024-10-04T19:03:41Z) - ValueCompass: A Framework of Fundamental Values for Human-AI Alignment [15.35489011078817]
We introduce Value, a framework of fundamental values, grounded in psychological theory and a systematic review.
We apply Value to measure the value alignment of humans and language models (LMs) across four real-world vignettes.
Our findings uncover risky misalignment between humans and LMs, such as LMs agreeing with values like "Choose Own Goals", which are largely disagreed by humans.
arXiv Detail & Related papers (2024-09-15T02:13:03Z) - Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP.
Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions.
We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z) - Modelling Human Values for AI Reasoning [2.320648715016106]
We detail a formal model of human values for their explicit computational representation.
We show how this model can provide the foundational apparatus for AI-based reasoning over values.
We propose a roadmap for future integrated, and interdisciplinary, research into human values in AI.
arXiv Detail & Related papers (2024-02-09T12:08:49Z) - Learning Human-like Representations to Enable Learning Human Values [12.628307026004656]
We argue that representational alignment between humans and AI agents facilitates value alignment.
We focus on ethics as one aspect of value alignment and train ML agents using a variety of methods.
arXiv Detail & Related papers (2023-12-21T18:31:33Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z) - Enabling Classifiers to Make Judgements Explicitly Aligned with Human
Values [73.82043713141142]
Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values.
We introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.
arXiv Detail & Related papers (2022-10-14T09:10:49Z) - ValueNet: A New Dataset for Human Value Driven Dialogue System [103.2044265617704]
We present a new large-scale human value dataset called ValueNet, which contains human attitudes on 21,374 text scenarios.
Comprehensive empirical results show that the learned value model could benefit a wide range of dialogue tasks.
ValueNet is the first large-scale text dataset for human value modeling.
arXiv Detail & Related papers (2021-12-12T23:02:52Z) - Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.
We find that current language models have a promising but incomplete ability to predict basic human ethical judgements.
Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.