Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches
- URL: http://arxiv.org/abs/2404.12744v2
- Date: Fri, 10 May 2024 06:09:02 GMT
- Title: Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches
- Authors: Pablo Biedma, Xiaoyuan Yi, Linus Huang, Maosong Sun, Xing Xie,
- Abstract summary: This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch.
Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs.
We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
- Score: 69.73783026870998
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advancements in Large Language Models (LLMs) have revolutionized the AI field but also pose potential safety and ethical risks. Deciphering LLMs' embedded values becomes crucial for assessing and mitigating their risks. Despite extensive investigation into LLMs' values, previous studies heavily rely on human-oriented value systems in social sciences. Then, a natural question arises: Do LLMs possess unique values beyond those of humans? Delving into it, this work proposes a novel framework, ValueLex, to reconstruct LLMs' unique value system from scratch, leveraging psychological methodologies from human personality/value research. Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs, synthesizing a taxonomy that culminates in a comprehensive value framework via factor analysis and semantic clustering. We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system. Based on this system, we further develop tailored projective tests to evaluate and analyze the value inclinations of LLMs across different model sizes, training methods, and data sources. Our framework fosters an interdisciplinary paradigm of understanding LLMs, paving the way for future AI alignment and regulation.
Related papers
- Measuring Human and AI Values based on Generative Psychometrics with Large Language Models [13.795641564238434]
In recent advances in AI, large language models (LLMs) have emerged as both tools and subjects of value measurement.
This work introduces Generative Psychometrics for Values (GPV), a data-driven value measurement paradigm grounded in text-revealed selective perceptions.
arXiv Detail & Related papers (2024-09-18T16:26:22Z) - CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses [34.77031649891843]
We introduce CLAVE, a novel framework which integrates two complementary Large Language Models (LLMs)
This dual-model approach enables calibration with any value systems using 100 human-labeled samples per value type.
We present ValEval, a comprehensive dataset comprising 13k+ (text,value,label) 12+s across diverse domains, covering three major value systems.
arXiv Detail & Related papers (2024-07-15T13:51:37Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Value FULCRA: Mapping Large Language Models to the Multidimensional
Spectrum of Basic Human Values [47.779186412943076]
We propose a novel basic value alignment paradigm and a value space spanned by basic value dimensions.
Inspired by basic values in humanity and social science across cultures, this work proposes a novel basic value alignment paradigm and a value space spanned by basic value dimensions.
To foster future research, we apply the representative Schwartz's Theory of Basic Values as an example and construct FULCRA, a dataset consisting of 5k (LLM output, value vector) pairs.
arXiv Detail & Related papers (2023-11-15T10:29:28Z) - LUNA: A Model-Based Universal Analysis Framework for Large Language Models [19.033382204019667]
Self-attention mechanism, extremely large model scale, and autoregressive generation schema present new challenges for quality analysis.
We propose a universal analysis framework for LLMs, designed to be general andinterpretable.
In particular, we first leverage the data from desired trustworthiness perspectives to construct an abstract model.
arXiv Detail & Related papers (2023-10-22T07:26:21Z) - Denevil: Towards Deciphering and Navigating the Ethical Values of Large
Language Models via Instruction Learning [36.66806788879868]
Large Language Models (LLMs) have made unprecedented breakthroughs, yet their integration into everyday life might raise societal risks due to generated unethical content.
This work delves into ethical values utilizing Moral Foundation Theory.
arXiv Detail & Related papers (2023-10-17T07:42:40Z) - A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry.
This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.