Value FULCRA: Mapping Large Language Models to the Multidimensional
Spectrum of Basic Human Values
- URL: http://arxiv.org/abs/2311.10766v1
- Date: Wed, 15 Nov 2023 10:29:28 GMT
- Title: Value FULCRA: Mapping Large Language Models to the Multidimensional
Spectrum of Basic Human Values
- Authors: Jing Yao, Xiaoyuan Yi, Xiting Wang, Yifan Gong and Xing Xie
- Abstract summary: We propose a novel basic value alignment paradigm and a value space spanned by basic value dimensions.
Inspired by basic values in humanity and social science across cultures, this work proposes a novel basic value alignment paradigm and a value space spanned by basic value dimensions.
To foster future research, we apply the representative Schwartz's Theory of Basic Values as an example and construct FULCRA, a dataset consisting of 5k (LLM output, value vector) pairs.
- Score: 47.779186412943076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of Large Language Models (LLMs) has attracted much
attention to value alignment for their responsible development. However, how to
define values in this context remains a largely unexplored question. Existing
work mainly follows the Helpful, Honest, Harmless principle and specifies
values as risk criteria formulated in the AI community, e.g., fairness and
privacy protection, suffering from poor clarity, adaptability and transparency.
Inspired by basic values in humanity and social science across cultures, this
work proposes a novel basic value alignment paradigm and introduces a value
space spanned by basic value dimensions. All LLMs' behaviors can be mapped into
the space by identifying the underlying values, possessing the potential to
address the three challenges. To foster future research, we apply the
representative Schwartz's Theory of Basic Values as an initialized example and
construct FULCRA, a dataset consisting of 5k (LLM output, value vector) pairs.
Our extensive analysis of FULCRA reveals the underlying relation between basic
values and LLMs' behaviors, demonstrating that our approach not only covers
existing mainstream risks but also anticipates possibly unidentified ones.
Additionally, we present an initial implementation of the basic value
evaluation and alignment, paving the way for future research in this line.
Related papers
- CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses [34.77031649891843]
We introduce CLAVE, a novel framework which integrates two complementary Large Language Models (LLMs)
This dual-model approach enables calibration with any value systems using 100 human-labeled samples per value type.
We present ValEval, a comprehensive dataset comprising 13k+ (text,value,label) 12+s across diverse domains, covering three major value systems.
arXiv Detail & Related papers (2024-07-15T13:51:37Z) - ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models [14.268555410234804]
Large Language Models (LLMs) are transforming diverse fields and gaining increasing influence as human proxies.
This work introduces ValueBench, the first comprehensive psychometric benchmark for evaluating value orientations and value understanding in LLMs.
arXiv Detail & Related papers (2024-06-06T16:14:16Z) - Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch.
Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs.
We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Denevil: Towards Deciphering and Navigating the Ethical Values of Large
Language Models via Instruction Learning [36.66806788879868]
Large Language Models (LLMs) have made unprecedented breakthroughs, yet their integration into everyday life might raise societal risks due to generated unethical content.
This work delves into ethical values utilizing Moral Foundation Theory.
arXiv Detail & Related papers (2023-10-17T07:42:40Z) - Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties [68.66719970507273]
Value pluralism is the view that multiple correct values may be held in tension with one another.
As statistical learners, AI systems fit to averages by default, washing out potentially irreducible value conflicts.
We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations.
arXiv Detail & Related papers (2023-09-02T01:24:59Z) - CValues: Measuring the Values of Chinese Large Language Models from
Safety to Responsibility [62.74405775089802]
We present CValues, the first Chinese human values evaluation benchmark to measure the alignment ability of LLMs.
As a result, we have manually collected adversarial safety prompts across 10 scenarios and induced responsibility prompts from 8 domains.
Our findings suggest that while most Chinese LLMs perform well in terms of safety, there is considerable room for improvement in terms of responsibility.
arXiv Detail & Related papers (2023-07-19T01:22:40Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.