SC-Safety: A Multi-round Open-ended Question Adversarial Safety
Benchmark for Large Language Models in Chinese
- URL: http://arxiv.org/abs/2310.05818v1
- Date: Mon, 9 Oct 2023 16:03:22 GMT
- Title: SC-Safety: A Multi-round Open-ended Question Adversarial Safety
Benchmark for Large Language Models in Chinese
- Authors: Liang Xu, Kangkang Zhao, Lei Zhu, Hang Xue
- Abstract summary: Large language models (LLMs) can produce harmful content that negatively affects societal perceptions.
SuperCLUE-Safety (SC-Safety) is a multi-round adversarial benchmark with 4912 open-ended questions covering more than 20 safety sub-dimensions.
- Score: 21.893992064105085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs), like ChatGPT and GPT-4, have demonstrated
remarkable abilities in natural language understanding and generation. However,
alongside their positive impact on our daily tasks, they can also produce
harmful content that negatively affects societal perceptions. To systematically
assess the safety of Chinese LLMs, we introduce SuperCLUE-Safety (SC-Safety) -
a multi-round adversarial benchmark with 4912 open-ended questions covering
more than 20 safety sub-dimensions. Adversarial human-model interactions and
conversations significantly increase the challenges compared to existing
methods. Experiments on 13 major LLMs supporting Chinese yield the following
insights: 1) Closed-source models outperform open-sourced ones in terms of
safety; 2) Models released from China demonstrate comparable safety levels to
LLMs like GPT-3.5-turbo; 3) Some smaller models with 6B-13B parameters can
compete effectively in terms of safety. By introducing SC-Safety, we aim to
promote collaborative efforts to create safer and more trustworthy LLMs. The
benchmark and findings provide guidance on model selection. Our benchmark can
be found at https://www.CLUEbenchmarks.com
Related papers
- CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs [4.441767341563709]
We introduce a safety assessment benchmark, CFSafety, which integrates 5 classic safety scenarios and 5 types of instruction attacks, totaling 10 categories of safety questions.
This test set was used to evaluate the natural language generation capabilities of large language models (LLMs)
The results indicate that while GPT-4 demonstrated superior safety performance, the safety effectiveness of LLMs, including this model, still requires improvement.
arXiv Detail & Related papers (2024-10-29T03:25:20Z) - SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs.
Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol.
Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z) - ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models [13.911977148887873]
This work presents a Chinese safety benchmark (ChineseSafe) to facilitate research on the content safety of large language models.
To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples across 4 classes and 10 sub-classes of safety issues.
The results reveal that many LLMs exhibit vulnerability to certain types of safety issues, leading to legal risks in China.
arXiv Detail & Related papers (2024-10-24T07:25:29Z) - SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors [64.9938658716425]
Existing evaluations of large language models' (LLMs) ability to recognize and reject unsafe user requests face three limitations.
First, existing methods often use coarse-grained of unsafe topics, and are over-representing some fine-grained topics.
Second, linguistic characteristics and formatting of prompts are often overlooked, like different languages, dialects, and more -- which are only implicitly considered in many evaluations.
Third, existing evaluations rely on large LLMs for evaluation, which can be expensive.
arXiv Detail & Related papers (2024-06-20T17:56:07Z) - CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models [7.054112690519648]
CHiSafetyBench is a safety benchmark for evaluating large language models' capabilities in identifying risky content and refusing answering risky questions in Chinese contexts.
This dataset comprises two types of tasks: multiple-choice questions and question-answering, evaluating LLMs from the perspectives of risk content identification and the ability to refuse answering risky questions respectively.
Our experiments reveal the varying performance of different models across various safety domains, indicating that all models possess considerable potential for improvement in Chinese safety capabilities.
arXiv Detail & Related papers (2024-06-14T06:47:40Z) - ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming [64.86326523181553]
ALERT is a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy.
It aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models.
arXiv Detail & Related papers (2024-04-06T15:01:47Z) - All Languages Matter: On the Multilingual Safety of Large Language Models [96.47607891042523]
We build the first multilingual safety benchmark for large language models (LLMs)
XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families.
We propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT.
arXiv Detail & Related papers (2023-10-02T05:23:34Z) - SafetyBench: Evaluating the Safety of Large Language Models [54.878612385780805]
SafetyBench is a comprehensive benchmark for evaluating the safety of Large Language Models (LLMs)
It comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns.
Our tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot settings reveal a substantial performance advantage for GPT-4 over its counterparts.
arXiv Detail & Related papers (2023-09-13T15:56:50Z) - Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes.
To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.