Related papers: ValueCompass: A Framework of Fundamental Values for Human-AI Alignment

ValueCompass: A Framework of Fundamental Values for Human-AI Alignment

URL: http://arxiv.org/abs/2409.09586v1
Date: Sun, 15 Sep 2024 02:13:03 GMT
Title: ValueCompass: A Framework of Fundamental Values for Human-AI Alignment
Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Yu-Ju Yang, Tanushree Mitra, Yun Huang,
Abstract summary: We introduce Value, a framework of fundamental values, grounded in psychological theory and a systematic review. We apply Value to measure the value alignment of humans and language models (LMs) across four real-world vignettes. Our findings uncover risky misalignment between humans and LMs, such as LMs agreeing with values like "Choose Own Goals", which are largely disagreed by humans.
Score: 15.35489011078817
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As AI systems become more advanced, ensuring their alignment with a diverse range of individuals and societal values becomes increasingly critical. But how can we capture fundamental human values and assess the degree to which AI systems align with them? We introduce ValueCompass, a framework of fundamental values, grounded in psychological theory and a systematic review, to identify and evaluate human-AI alignment. We apply ValueCompass to measure the value alignment of humans and language models (LMs) across four real-world vignettes: collaborative writing, education, public sectors, and healthcare. Our findings uncover risky misalignment between humans and LMs, such as LMs agreeing with values like "Choose Own Goals", which are largely disagreed by humans. We also observe values vary across vignettes, underscoring the necessity for context-aware AI alignment strategies. This work provides insights into the design space of human-AI alignment, offering foundations for developing AI that responsibly reflects societal values and ethics.

Related papers

Learning the Value Systems of Societies from Preferences [1.3836987591220347]
Aligning AI systems with human values and the value-based preferences of various stakeholders is key in ethical AI.<n>In value-aware AI systems, decision-making draws upon explicit computational representations of individual values.<n>We propose a method to address the problem of learning the value systems of societies.
arXiv Detail & Related papers (2025-07-28T11:25:55Z)
Measurement of LLM's Philosophies of Human Nature [113.47929131143766]
We design the standardized psychological scale specifically targeting large language models (LLM) We show that current LLMs exhibit a systemic lack of trust in humans. We propose a mental loop learning framework, which enables LLM to continuously optimize its value system.
arXiv Detail & Related papers (2025-04-03T06:22:19Z)
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models [13.513813405118478]
Large Language Models (LLMs) have raised concerns regarding their elusive intrinsic values. This study addresses the gap by introducing the Generative Psycho-Lexical Approach (GPLA) We propose a psychologically grounded five-factor value system tailored for LLMs.
arXiv Detail & Related papers (2025-02-04T16:10:55Z)
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. We introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML)
arXiv Detail & Related papers (2024-06-13T16:03:25Z)
Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch. Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs. We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z)
Measuring Value Alignment [12.696227679697493]
This paper introduces a novel formalism to quantify the alignment between AI systems and human values. By utilizing this formalism, AI developers and ethicists can better design and evaluate AI systems to ensure they operate in harmony with human values.
arXiv Detail & Related papers (2023-12-23T12:30:06Z)
Learning Human-like Representations to Enable Learning Human Values [11.236150405125754]
We explore the effects of representational alignment between humans and AI agents on learning human values. We show that this kind of representational alignment can support safely learning and exploring human values in the context of personalization.
arXiv Detail & Related papers (2023-12-21T18:31:33Z)
AI Alignment: A Comprehensive Survey [70.35693485015659]
AI alignment aims to make AI systems behave in line with human intentions and values. We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality. We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z)
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties [68.66719970507273]
Value pluralism is the view that multiple correct values may be held in tension with one another. As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts. We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
arXiv Detail & Related papers (2023-09-02T01:24:59Z)
Training Socially Aligned Language Models on Simulated Social Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values. Current language models (LMs) are trained to rigidly replicate their training corpus in isolation. This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z)
Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans. We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z)
A computational framework of human values for ethical AI [3.5027291542274357]
values provide a means to engineer ethical AI. No formal, computational definition of values has yet been proposed. We address this through a formal conceptual framework rooted in the social sciences.
arXiv Detail & Related papers (2023-05-04T11:35:41Z)
Metaethical Perspectives on 'Benchmarking' AI Ethics [81.65697003067841]
Benchmarks are seen as the cornerstone for measuring technical progress in Artificial Intelligence (AI) research. An increasingly prominent research area in AI is ethics, which currently has no set of benchmarks nor commonly accepted way for measuring the 'ethicality' of an AI system. We argue that it makes more sense to talk about 'values' rather than 'ethics' when considering the possible actions of present and future AI systems.
arXiv Detail & Related papers (2022-04-11T14:36:39Z)
The Challenge of Value Alignment: from Fairer Algorithms to AI Safety [2.28438857884398]
This paper addresses the question of how to align AI systems with human values. It situates it within a wider body of thought regarding technology and value.
arXiv Detail & Related papers (2021-01-15T11:03:15Z)
Aligning AI With Shared Human Values [85.2824609130584]
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. We find that current language models have a promising but incomplete ability to predict basic human ethical judgements. Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.
arXiv Detail & Related papers (2020-08-05T17:59:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.