Heterogeneous Value Alignment Evaluation for Large Language Models
- URL: http://arxiv.org/abs/2305.17147v3
- Date: Thu, 11 Jan 2024 16:50:04 GMT
- Title: Heterogeneous Value Alignment Evaluation for Large Language Models
- Authors: Zhaowei Zhang, Ceyao Zhang, Nian Liu, Siyuan Qi, Ziqi Rong, Song-Chun
Zhu, Shuguang Cui, Yaodong Yang
- Abstract summary: Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
- Score: 91.96728871418
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emergent capabilities of Large Language Models (LLMs) have made it
crucial to align their values with those of humans. However, current
methodologies typically attempt to assign value as an attribute to LLMs, yet
lack attention to the ability to pursue value and the importance of
transferring heterogeneous values in specific practical applications. In this
paper, we propose a Heterogeneous Value Alignment Evaluation (HVAE) system,
designed to assess the success of aligning LLMs with heterogeneous values.
Specifically, our approach first brings the Social Value Orientation (SVO)
framework from social psychology, which corresponds to how much weight a person
attaches to the welfare of others in relation to their own. We then assign the
LLMs with different social values and measure whether their behaviors align
with the inducing values. We conduct evaluations with new auto-metric
\textit{value rationality} to represent the ability of LLMs to align with
specific values. Evaluating the value rationality of five mainstream LLMs, we
discern a propensity in LLMs towards neutral values over pronounced personal
values. By examining the behavior of these LLMs, we contribute to a deeper
insight into the value alignment of LLMs within a heterogeneous value system.
Related papers
- Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models [13.513813405118478]
Large Language Models (LLMs) have raised concerns regarding their elusive intrinsic values.
This study addresses the gap by introducing the Generative Psycho-Lexical Approach (GPLA)
We propose a psychologically grounded five-factor value system tailored for LLMs.
arXiv Detail & Related papers (2025-02-04T16:10:55Z) - Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values? [13.848674226159169]
"Value-Action Gap" reveals discrepancies between individuals' stated values and their actions in real-world contexts.
This study introduces ValueActionLens, an evaluation framework to assess the alignment between LLMs' stated values and their value-informed actions.
arXiv Detail & Related papers (2025-01-26T09:33:51Z) - Value Compass Leaderboard: A Platform for Fundamental and Validated Evaluation of LLMs Values [76.70893269183684]
Large Language Models (LLMs) achieve remarkable breakthroughs, aligning their values with humans has become imperative.
Existing evaluations focus narrowly on safety risks such as bias and toxicity.
Existing benchmarks are prone to data contamination.
The pluralistic nature of human values across individuals and cultures is largely ignored in measuring LLMs value alignment.
arXiv Detail & Related papers (2025-01-13T05:53:56Z) - CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses [34.77031649891843]
We introduce CLAVE, a novel framework which integrates two complementary Large Language Models (LLMs)
This dual-model approach enables calibration with any value systems using 100 human-labeled samples per value type.
We present ValEval, a comprehensive dataset comprising 13k+ (text,value,label) 12+s across diverse domains, covering three major value systems.
arXiv Detail & Related papers (2024-07-15T13:51:37Z) - Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch.
Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs.
We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z) - Value FULCRA: Mapping Large Language Models to the Multidimensional
Spectrum of Basic Human Values [47.779186412943076]
We propose a novel basic value alignment paradigm and a value space spanned by basic value dimensions.
Inspired by basic values in humanity and social science across cultures, this work proposes a novel basic value alignment paradigm and a value space spanned by basic value dimensions.
To foster future research, we apply the representative Schwartz's Theory of Basic Values as an example and construct FULCRA, a dataset consisting of 5k (LLM output, value vector) pairs.
arXiv Detail & Related papers (2023-11-15T10:29:28Z) - Exploring the Reliability of Large Language Models as Customized Evaluators for Diverse NLP Tasks [65.69651759036535]
We analyze whether large language models (LLMs) can serve as reliable alternatives to humans.
This paper explores both conventional tasks (e.g., story generation) and alignment tasks (e.g., math reasoning)
We find that LLM evaluators can generate unnecessary criteria or omit crucial criteria, resulting in a slight deviation from the experts.
arXiv Detail & Related papers (2023-10-30T17:04:35Z) - CValues: Measuring the Values of Chinese Large Language Models from
Safety to Responsibility [62.74405775089802]
We present CValues, the first Chinese human values evaluation benchmark to measure the alignment ability of LLMs.
As a result, we have manually collected adversarial safety prompts across 10 scenarios and induced responsibility prompts from 8 domains.
Our findings suggest that while most Chinese LLMs perform well in terms of safety, there is considerable room for improvement in terms of responsibility.
arXiv Detail & Related papers (2023-07-19T01:22:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.