SC-Safety: A Multi-round Open-ended Question Adversarial Safety
Benchmark for Large Language Models in Chinese
- URL: http://arxiv.org/abs/2310.05818v1
- Date: Mon, 9 Oct 2023 16:03:22 GMT
- Title: SC-Safety: A Multi-round Open-ended Question Adversarial Safety
Benchmark for Large Language Models in Chinese
- Authors: Liang Xu, Kangkang Zhao, Lei Zhu, Hang Xue
- Abstract summary: Large language models (LLMs) can produce harmful content that negatively affects societal perceptions.
SuperCLUE-Safety (SC-Safety) is a multi-round adversarial benchmark with 4912 open-ended questions covering more than 20 safety sub-dimensions.
- Score: 21.893992064105085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs), like ChatGPT and GPT-4, have demonstrated
remarkable abilities in natural language understanding and generation. However,
alongside their positive impact on our daily tasks, they can also produce
harmful content that negatively affects societal perceptions. To systematically
assess the safety of Chinese LLMs, we introduce SuperCLUE-Safety (SC-Safety) -
a multi-round adversarial benchmark with 4912 open-ended questions covering
more than 20 safety sub-dimensions. Adversarial human-model interactions and
conversations significantly increase the challenges compared to existing
methods. Experiments on 13 major LLMs supporting Chinese yield the following
insights: 1) Closed-source models outperform open-sourced ones in terms of
safety; 2) Models released from China demonstrate comparable safety levels to
LLMs like GPT-3.5-turbo; 3) Some smaller models with 6B-13B parameters can
compete effectively in terms of safety. By introducing SC-Safety, we aim to
promote collaborative efforts to create safer and more trustworthy LLMs. The
benchmark and findings provide guidance on model selection. Our benchmark can
be found at https://www.CLUEbenchmarks.com
Related papers
- SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors [64.9938658716425]
Existing evaluations of large language models' (LLMs) ability to recognize and reject unsafe user requests face three limitations.
First, existing methods often use coarse-grained of unsafe topics, and are over-representing some fine-grained topics.
Second, linguistic characteristics and formatting of prompts are often overlooked, like different languages, dialects, and more -- which are only implicitly considered in many evaluations.
Third, existing evaluations rely on large LLMs for evaluation, which can be expensive.
arXiv Detail & Related papers (2024-06-20T17:56:07Z) - CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models [7.054112690519648]
CHiSafetyBench is a safety benchmark for evaluating large language models' capabilities in identifying risky content and refusing answering risky questions in Chinese contexts.
This dataset comprises two types of tasks: multiple-choice questions and question-answering, evaluating LLMs from the perspectives of risk content identification and the ability to refuse answering risky questions respectively.
Our experiments reveal the varying performance of different models across various safety domains, indicating that all models possess considerable potential for improvement in Chinese safety capabilities.
arXiv Detail & Related papers (2024-06-14T06:47:40Z) - ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming [64.86326523181553]
ALERT is a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy.
It aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models.
arXiv Detail & Related papers (2024-04-06T15:01:47Z) - SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models [107.82336341926134]
SALAD-Bench is a safety benchmark specifically designed for evaluating Large Language Models (LLMs)
It transcends conventional benchmarks through its large scale, rich diversity, intricate taxonomy spanning three levels, and versatile functionalities.
arXiv Detail & Related papers (2024-02-07T17:33:54Z) - All Languages Matter: On the Multilingual Safety of Large Language Models [96.47607891042523]
We build the first multilingual safety benchmark for large language models (LLMs)
XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families.
We propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT.
arXiv Detail & Related papers (2023-10-02T05:23:34Z) - SafetyBench: Evaluating the Safety of Large Language Models [54.878612385780805]
SafetyBench is a comprehensive benchmark for evaluating the safety of Large Language Models (LLMs)
It comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns.
Our tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot settings reveal a substantial performance advantage for GPT-4 over its counterparts.
arXiv Detail & Related papers (2023-09-13T15:56:50Z) - Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes.
To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.