ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models
- URL: http://arxiv.org/abs/2310.00378v4
- Date: Mon, 17 Jun 2024 07:58:00 GMT
- Title: ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models
- Authors: Zhaowei Zhang, Fengshuo Bai, Jun Gao, Yaodong Yang,
- Abstract summary: We argue that truly understanding values in Large Language Models (LLMs) requires both "know what" and "know why"
We present a comprehensive evaluation metric, ValueDCG, to quantitatively assess the two aspects with an engineering implementation.
- Score: 10.989615390700113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personal values are a crucial factor behind human decision-making. Considering that Large Language Models (LLMs) have been shown to impact human decisions significantly, it is essential to make sure they accurately understand human values to ensure their safety. However, evaluating their grasp of these values is complex due to the value's intricate and adaptable nature. We argue that truly understanding values in LLMs requires considering both "know what" and "know why". To this end, we present a comprehensive evaluation metric, ValueDCG (Value Discriminator-Critique Gap), to quantitatively assess the two aspects with an engineering implementation. We assess four representative LLMs and provide compelling evidence that the growth rates of LLM's "know what" and "know why" capabilities do not align with increases in parameter numbers, resulting in a decline in the models' capacity to understand human values as larger amounts of parameters. This may further suggest that LLMs might craft plausible explanations based on the provided context without truly understanding their inherent value, indicating potential risks.
Related papers
- Do LLMs have Consistent Values? [27.58375296918161]
Large Language Models (LLM) technology is constantly improving towards human-like dialogue.
Values are a basic driving force underlying human behavior, but little research has been done to study the values exhibited in text generated by LLMs.
We ask whether LLMs exhibit the same value structure that has been demonstrated in humans, including the ranking of values, and correlation between values.
arXiv Detail & Related papers (2024-07-16T08:58:00Z) - CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses [34.77031649891843]
We introduce CLAVE, a novel framework which integrates two complementary Large Language Models (LLMs)
This dual-model approach enables calibration with any value systems using 100 human-labeled samples per value type.
We present ValEval, a comprehensive dataset comprising 13k+ (text,value,label) 12+s across diverse domains, covering three major value systems.
arXiv Detail & Related papers (2024-07-15T13:51:37Z) - Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction.
We find that contextual characteristics significantly affect human reliance behavior.
Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z) - Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch.
Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs.
We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties [68.66719970507273]
Value pluralism is the view that multiple correct values may be held in tension with one another.
As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts.
We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
arXiv Detail & Related papers (2023-09-02T01:24:59Z) - CValues: Measuring the Values of Chinese Large Language Models from
Safety to Responsibility [62.74405775089802]
We present CValues, the first Chinese human values evaluation benchmark to measure the alignment ability of LLMs.
As a result, we have manually collected adversarial safety prompts across 10 scenarios and induced responsibility prompts from 8 domains.
Our findings suggest that while most Chinese LLMs perform well in terms of safety, there is considerable room for improvement in terms of responsibility.
arXiv Detail & Related papers (2023-07-19T01:22:40Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.