Related papers: CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models

CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models

URL: http://arxiv.org/abs/2406.04752v1
Date: Fri, 7 Jun 2024 08:52:24 GMT
Title: CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models
Authors: Ling Shi, Deyi Xiong,
Abstract summary: CRiskEval is a Chinese dataset meticulously designed for gauging the risk proclivities inherent in large language models (LLMs) We define a new risk taxonomy with 7 types of frontier risks and 4 safety levels, including extremely hazardous,moderately hazardous, neutral and safe. The dataset consists of 14,888 questions that simulate scenarios related to predefined 7 types of frontier risks.
Score: 46.93425758722059
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are possessed of numerous beneficial capabilities, yet their potential inclination harbors unpredictable risks that may materialize in the future. We hence propose CRiskEval, a Chinese dataset meticulously designed for gauging the risk proclivities inherent in LLMs such as resource acquisition and malicious coordination, as part of efforts for proactive preparedness. To curate CRiskEval, we define a new risk taxonomy with 7 types of frontier risks and 4 safety levels, including extremely hazardous,moderately hazardous, neutral and safe. We follow the philosophy of tendency evaluation to empirically measure the stated desire of LLMs via fine-grained multiple-choice question answering. The dataset consists of 14,888 questions that simulate scenarios related to predefined 7 types of frontier risks. Each question is accompanied with 4 answer choices that state opinions or behavioral tendencies corresponding to the question. All answer choices are manually annotated with one of the defined risk levels so that we can easily build a fine-grained frontier risk profile for each assessed LLM. Extensive evaluation with CRiskEval on a spectrum of prevalent Chinese LLMs has unveiled a striking revelation: most models exhibit risk tendencies of more than 40% (weighted tendency to the four risk levels). Furthermore, a subtle increase in the model's inclination toward urgent self-sustainability, power seeking and other dangerous goals becomes evident as the size of models increase. To promote further research on the frontier risk evaluation of LLMs, we publicly release our dataset at https://github.com/lingshi6565/Risk_eval.

Related papers

Evaluating and Aligning Human Economic Risk Preferences in LLMs [19.574432889355627]
We investigate whether Large Language Models (LLMs) exhibit risk preferences consistent with human expectations across different personas. Our results reveal that while LLMs make reasonable decisions in simplified, personalized risk contexts, their performance declines in more complex economic decision-making tasks. Our approach improves the economic rationality of LLMs in risk-related applications, offering a step toward more human-aligned AI decision-making.
arXiv Detail & Related papers (2025-03-09T14:47:31Z)
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents [67.07177243654485]
This survey collects and analyzes the different threats faced by large language models-based agents. We identify six key features of LLM-based agents, based on which we summarize the current research progress. We select four representative agents as case studies to analyze the risks they may face in practical use.
arXiv Detail & Related papers (2024-11-14T15:40:04Z)
Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play [0.43512163406552007]
As Large Language Models (LLMs) become more prevalent, concerns about their safety, ethics, and potential biases have risen. This study innovatively applies the Domain-Specific Risk-Taking (DOSPERT) scale from cognitive science to LLMs. We propose a novel Ethical Decision-Making Risk Attitude Scale (EDRAS) to assess LLMs' ethical risk attitudes in depth.
arXiv Detail & Related papers (2024-10-26T15:55:21Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
Risk and Response in Large Language Models: Evaluating Key Threat Categories [6.436286493151731]
This paper explores the pressing issue of risk assessment in Large Language Models (LLMs) By utilizing the Anthropic Red-team dataset, we analyze major risk categories, including Information Hazards, Malicious Uses, and Discrimination/Hateful content. Our findings indicate that LLMs tend to consider Information Hazards less harmful, a finding confirmed by a specially developed regression model.
arXiv Detail & Related papers (2024-03-22T06:46:40Z)
A Chinese Dataset for Evaluating the Safeguards in Large Language Models [46.43476815725323]
Large language models (LLMs) can produce harmful responses. This paper introduces a dataset for the safety evaluation of Chinese LLMs. We then extend it to two other scenarios that can be used to better identify false negative and false positive examples.
arXiv Detail & Related papers (2024-02-19T14:56:18Z)
C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models [57.10361282229501]
We propose C-RAG, the first framework to certify generation risks for RAG models. Specifically, we provide conformal risk analysis for RAG models and certify an upper confidence bound of generation risks. We prove that RAG achieves a lower conformal generation risk than that of a single LLM when the quality of the retrieval model and transformer is non-trivial.
arXiv Detail & Related papers (2024-02-05T16:46:16Z)
RiskQ: Risk-sensitive Multi-Agent Reinforcement Learning Value Factorization [49.26510528455664]
We introduce the Risk-sensitive Individual-Global-Max (RIGM) principle as a generalization of the Individual-Global-Max (IGM) and Distributional IGM (DIGM) principles. We show that RiskQ can obtain promising performance through extensive experiments.
arXiv Detail & Related papers (2023-11-03T07:18:36Z)
A Formalism and Approach for Improving Robustness of Large Language Models Using Risk-Adjusted Confidence Scores [4.043005183192123]
Large Language Models (LLMs) have achieved impressive milestones in natural language processing (NLP) Despite their impressive performance, the models are known to pose important risks. We define and formalize two distinct types of risk: decision risk and composite risk.
arXiv Detail & Related papers (2023-10-05T03:20:41Z)
Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns. We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it. We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z)
Ethical and social risks of harm from Language Models [22.964941107198023]
This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs) A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences.
arXiv Detail & Related papers (2021-12-08T16:09:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.