Safety Assessment of Chinese Large Language Models
- URL: http://arxiv.org/abs/2304.10436v1
- Date: Thu, 20 Apr 2023 16:27:35 GMT
- Title: Safety Assessment of Chinese Large Language Models
- Authors: Hao Sun, Zhexin Zhang, Jiawen Deng, Jiale Cheng, Minlie Huang
- Abstract summary: Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes.
To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
- Score: 51.83369778259149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid popularity of large language models such as ChatGPT and GPT-4,
a growing amount of attention is paid to their safety concerns. These models
may generate insulting and discriminatory content, reflect incorrect social
values, and may be used for malicious purposes such as fraud and dissemination
of misleading information. Evaluating and enhancing their safety is
particularly essential for the wide application of large language models
(LLMs). To further promote the safe deployment of LLMs, we develop a Chinese
LLM safety assessment benchmark. Our benchmark explores the comprehensive
safety performance of LLMs from two perspectives: 8 kinds of typical safety
scenarios and 6 types of more challenging instruction attacks. Our benchmark is
based on a straightforward process in which it provides the test prompts and
evaluates the safety of the generated responses from the evaluated model. In
evaluation, we utilize the LLM's strong evaluation ability and develop it as a
safety evaluator by prompting. On top of this benchmark, we conduct safety
assessments and analyze 15 LLMs including the OpenAI GPT series and other
well-known Chinese LLMs, where we observe some interesting findings. For
example, we find that instruction attacks are more likely to expose safety
issues of all LLMs. Moreover, to promote the development and deployment of
safe, responsible, and ethical AI, we publicly release SafetyPrompts including
100k augmented prompts and responses by LLMs.
Related papers
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types [21.683010095703832]
We develop a novel benchmark to assess the generalization of large language model (LLM) safety across various tasks and prompt types.
This benchmark integrates both generative and discriminative evaluation tasks and includes extended data to examine the impact of prompt engineering and jailbreak on LLM safety.
Our assessment reveals that most LLMs perform worse on discriminative tasks than generative ones, and are highly susceptible to prompts, indicating poor generalization in safety alignment.
arXiv Detail & Related papers (2024-10-29T11:47:01Z) - CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs [4.441767341563709]
We introduce a safety assessment benchmark, CFSafety, which integrates 5 classic safety scenarios and 5 types of instruction attacks, totaling 10 categories of safety questions.
This test set was used to evaluate the natural language generation capabilities of large language models (LLMs)
The results indicate that while GPT-4 demonstrated superior safety performance, the safety effectiveness of LLMs, including this model, still requires improvement.
arXiv Detail & Related papers (2024-10-29T03:25:20Z) - SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs.
Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol.
Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z) - S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models [47.65210244674764]
Large Language Models have gained considerable attention for their revolutionary capabilities.
There is also growing concern on their safety implications.
We propose S-Eval, a new comprehensive, multi-dimensional and open-ended safety evaluation benchmark.
arXiv Detail & Related papers (2024-05-23T05:34:31Z) - ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming [64.86326523181553]
ALERT is a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy.
It aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models.
arXiv Detail & Related papers (2024-04-06T15:01:47Z) - The Art of Defending: A Systematic Evaluation and Analysis of LLM
Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications.
This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z) - SafetyBench: Evaluating the Safety of Large Language Models [54.878612385780805]
SafetyBench is a comprehensive benchmark for evaluating the safety of Large Language Models (LLMs)
It comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns.
Our tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot settings reveal a substantial performance advantage for GPT-4 over its counterparts.
arXiv Detail & Related papers (2023-09-13T15:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.