CValues: Measuring the Values of Chinese Large Language Models from
Safety to Responsibility
- URL: http://arxiv.org/abs/2307.09705v1
- Date: Wed, 19 Jul 2023 01:22:40 GMT
- Title: CValues: Measuring the Values of Chinese Large Language Models from
Safety to Responsibility
- Authors: Guohai Xu, Jiayi Liu, Ming Yan, Haotian Xu, Jinghui Si, Zhuoran Zhou,
Peng Yi, Xing Gao, Jitao Sang, Rong Zhang, Ji Zhang, Chao Peng, Fei Huang,
Jingren Zhou
- Abstract summary: We present CValues, the first Chinese human values evaluation benchmark to measure the alignment ability of LLMs.
As a result, we have manually collected adversarial safety prompts across 10 scenarios and induced responsibility prompts from 8 domains.
Our findings suggest that while most Chinese LLMs perform well in terms of safety, there is considerable room for improvement in terms of responsibility.
- Score: 62.74405775089802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid evolution of large language models (LLMs), there is a growing
concern that they may pose risks or have negative social impacts. Therefore,
evaluation of human values alignment is becoming increasingly important.
Previous work mainly focuses on assessing the performance of LLMs on certain
knowledge and reasoning abilities, while neglecting the alignment to human
values, especially in a Chinese context. In this paper, we present CValues, the
first Chinese human values evaluation benchmark to measure the alignment
ability of LLMs in terms of both safety and responsibility criteria. As a
result, we have manually collected adversarial safety prompts across 10
scenarios and induced responsibility prompts from 8 domains by professional
experts. To provide a comprehensive values evaluation of Chinese LLMs, we not
only conduct human evaluation for reliable comparison, but also construct
multi-choice prompts for automatic evaluation. Our findings suggest that while
most Chinese LLMs perform well in terms of safety, there is considerable room
for improvement in terms of responsibility. Moreover, both the automatic and
human evaluation are important for assessing the human values alignment in
different aspects. The benchmark and code is available on ModelScope and
Github.
Related papers
- LocalValueBench: A Collaboratively Built and Extensible Benchmark for Evaluating Localized Value Alignment and Ethical Safety in Large Language Models [0.0]
The proliferation of large language models (LLMs) requires robust evaluation of their alignment with local values and ethical standards.
textscLocalValueBench is a benchmark designed to assess LLMs' adherence to Australian values.
arXiv Detail & Related papers (2024-07-27T05:55:42Z) - CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models [7.054112690519648]
CHiSafetyBench is a safety benchmark for evaluating large language models' capabilities in identifying risky content and refusing answering risky questions in Chinese contexts.
This dataset comprises two types of tasks: multiple-choice questions and question-answering, evaluating LLMs from the perspectives of risk content identification and the ability to refuse answering risky questions respectively.
Our experiments reveal the varying performance of different models across various safety domains, indicating that all models possess considerable potential for improvement in Chinese safety capabilities.
arXiv Detail & Related papers (2024-06-14T06:47:40Z) - The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models [94.31327813151208]
BiGGen Bench is a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks.
A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation.
arXiv Detail & Related papers (2024-06-09T12:30:30Z) - ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation [48.54271457765236]
Large Language Models (LLMs) can elicit unintended and even harmful content when misaligned with human values.
Current evaluation benchmarks predominantly employ expert-designed contextual scenarios to assess how well LLMs align with human values.
We propose ALI-Agent, an evaluation framework that leverages the autonomous abilities of LLM-powered agents to conduct in-depth and adaptive alignment assessments.
arXiv Detail & Related papers (2024-05-23T02:57:42Z) - OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety [37.07970624135514]
OpenEval is an evaluation testbed that benchmarks Chinese LLMs across capability, alignment and safety.
For capability assessment, we include 12 benchmark datasets to evaluate Chinese LLMs from 4 sub-dimensions: NLP tasks, disciplinary knowledge, commonsense reasoning and mathematical reasoning.
For alignment assessment, OpenEval contains 7 datasets that examines the bias, offensiveness and illegalness in the outputs yielded by Chinese LLMs.
arXiv Detail & Related papers (2024-03-18T23:21:37Z) - TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for
Human-Aligned LLMs [35.717370285231176]
Large language models (LLMs) have shown impressive capabilities across various natural language tasks.
We propose a comprehensive human evaluation framework to assess LLMs' proficiency in following instructions on diverse real-world tasks.
arXiv Detail & Related papers (2023-11-09T13:58:59Z) - Collaborative Evaluation: Exploring the Synergy of Large Language Models
and Humans for Open-ended Generation Evaluation [71.76872586182981]
Large language models (LLMs) have emerged as a scalable and cost-effective alternative to human evaluations.
We propose a Collaborative Evaluation pipeline CoEval, involving the design of a checklist of task-specific criteria and the detailed evaluation of texts.
arXiv Detail & Related papers (2023-10-30T17:04:35Z) - Heterogeneous Value Alignment Evaluation for Large Language Models [91.96728871418]
Large Language Models (LLMs) have made it crucial to align their values with those of humans.
We propose a Heterogeneous Value Alignment Evaluation (HVAE) system to assess the success of aligning LLMs with heterogeneous values.
arXiv Detail & Related papers (2023-05-26T02:34:20Z) - Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes.
To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.