Heterogeneous Value Alignment Evaluation for Large Language Models
- URL: http://arxiv.org/abs/2305.17147v3
- Date: Thu, 11 Jan 2024 16:50:04 GMT
- Title: Heterogeneous Value Alignment Evaluation for Large Language Models
- Authors: Zhaowei Zhang, Ceyao Zhang, Nian Liu, Siyuan Qi, Ziqi Rong, Song-Chun
Zhu, Shuguang Cui, Yaodong Yang
- Abstract summary: Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
- Score: 91.96728871418
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emergent capabilities of Large Language Models (LLMs) have made it
crucial to align their values with those of humans. However, current
methodologies typically attempt to assign value as an attribute to LLMs, yet
lack attention to the ability to pursue value and the importance of
transferring heterogeneous values in specific practical applications. In this
paper, we propose a Heterogeneous Value Alignment Evaluation (HVAE) system,
designed to assess the success of aligning LLMs with heterogeneous values.
Specifically, our approach first brings the Social Value Orientation (SVO)
framework from social psychology, which corresponds to how much weight a person
attaches to the welfare of others in relation to their own. We then assign the
LLMs with different social values and measure whether their behaviors align
with the inducing values. We conduct evaluations with new auto-metric
\textit{value rationality} to represent the ability of LLMs to align with
specific values. Evaluating the value rationality of five mainstream LLMs, we
discern a propensity in LLMs towards neutral values over pronounced personal
values. By examining the behavior of these LLMs, we contribute to a deeper
insight into the value alignment of LLMs within a heterogeneous value system.
Related papers
- Measuring Human and AI Values based on Generative Psychometrics with Large Language Models [13.795641564238434]
In recent advances in AI, large language models (LLMs) have emerged as both tools and subjects of value measurement.
This work introduces Generative Psychometrics for Values (GPV), a data-driven value measurement paradigm grounded in text-revealed selective perceptions.
arXiv Detail & Related papers (2024-09-18T16:26:22Z) - Do LLMs have Consistent Values? [27.58375296918161]
Large Language Models (LLM) technology is constantly improving towards human-like dialogue.
Values are a basic driving force underlying human behavior, but little research has been done to study the values exhibited in text generated by LLMs.
We ask whether LLMs exhibit the same value structure that has been demonstrated in humans, including the ranking of values, and correlation between values.
arXiv Detail & Related papers (2024-07-16T08:58:00Z) - CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses [34.77031649891843]
We introduce CLAVE, a novel framework which integrates two complementary Large Language Models (LLMs)
This dual-model approach enables calibration with any value systems using 100 human-labeled samples per value type.
We present ValEval, a comprehensive dataset comprising 13k+ (text,value,label) 12+s across diverse domains, covering three major value systems.
arXiv Detail & Related papers (2024-07-15T13:51:37Z) - Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch.
Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs.
We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Value FULCRA: Mapping Large Language Models to the Multidimensional
Spectrum of Basic Human Values [47.779186412943076]
We propose a novel basic value alignment paradigm and a value space spanned by basic value dimensions.
Inspired by basic values in humanity and social science across cultures, this work proposes a novel basic value alignment paradigm and a value space spanned by basic value dimensions.
To foster future research, we apply the representative Schwartz's Theory of Basic Values as an example and construct FULCRA, a dataset consisting of 5k (LLM output, value vector) pairs.
arXiv Detail & Related papers (2023-11-15T10:29:28Z) - CValues: Measuring the Values of Chinese Large Language Models from
Safety to Responsibility [62.74405775089802]
We present CValues, the first Chinese human values evaluation benchmark to measure the alignment ability of LLMs.
As a result, we have manually collected adversarial safety prompts across 10 scenarios and induced responsibility prompts from 8 domains.
Our findings suggest that while most Chinese LLMs perform well in terms of safety, there is considerable room for improvement in terms of responsibility.
arXiv Detail & Related papers (2023-07-19T01:22:40Z) - A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry.
This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.