A Survey of Safety and Trustworthiness of Large Language Models through
the Lens of Verification and Validation
- URL: http://arxiv.org/abs/2305.11391v2
- Date: Sun, 27 Aug 2023 13:12:30 GMT
- Title: A Survey of Safety and Trustworthiness of Large Language Models through
the Lens of Verification and Validation
- Authors: Xiaowei Huang, Wenjie Ruan, Wei Huang, Gaojie Jin, Yi Dong, Changshun
Wu, Saddek Bensalem, Ronghui Mu, Yi Qi, Xingyu Zhao, Kaiwen Cai, Yanghao
Zhang, Sihao Wu, Peipei Xu, Dengyu Wu, Andre Freitas, Mustafa A. Mustafa
- Abstract summary: Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations.
This survey concerns their safety and trustworthiness in industrial applications.
- Score: 21.242078120036176
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have exploded a new heatwave of AI for their
ability to engage end-users in human-level conversations with detailed and
articulate answers across many knowledge domains. In response to their fast
adoption in many industrial applications, this survey concerns their safety and
trustworthiness. First, we review known vulnerabilities and limitations of the
LLMs, categorising them into inherent issues, attacks, and unintended bugs.
Then, we consider if and how the Verification and Validation (V&V) techniques,
which have been widely developed for traditional software and deep learning
models such as convolutional neural networks as independent processes to check
the alignment of their implementations against the specifications, can be
integrated and further extended throughout the lifecycle of the LLMs to provide
rigorous analysis to the safety and trustworthiness of LLMs and their
applications. Specifically, we consider four complementary techniques:
falsification and evaluation, verification, runtime monitoring, and regulations
and ethical use. In total, 370+ references are considered to support the quick
understanding of the safety and trustworthiness issues from the perspective of
V&V. While intensive research has been conducted to identify the safety and
trustworthiness issues, rigorous yet practical methods are called for to ensure
the alignment of LLMs with safety and trustworthiness requirements.
Related papers
- SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs.
Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol.
Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z) - LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs [80.45174785447136]
Laboratory accidents pose significant risks to human life and property.
Despite advancements in safety training, laboratory personnel may still unknowingly engage in unsafe practices.
There is a growing concern about large language models (LLMs) for guidance in various fields.
arXiv Detail & Related papers (2024-10-18T05:21:05Z) - Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.
For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.
We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z) - Current state of LLM Risks and AI Guardrails [0.0]
Large language models (LLMs) have become increasingly sophisticated, leading to widespread deployment in sensitive applications where safety and reliability are paramount.
These risks necessitate the development of "guardrails" to align LLMs with desired behaviors and mitigate potential harm.
This work explores the risks associated with deploying LLMs and evaluates current approaches to implementing guardrails and model alignment techniques.
arXiv Detail & Related papers (2024-06-16T22:04:10Z) - Large Language Models for Cyber Security: A Systematic Literature Review [14.924782327303765]
We conduct a comprehensive review of the literature on the application of Large Language Models in cybersecurity (LLM4Security)
We observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection.
Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training.
arXiv Detail & Related papers (2024-05-08T02:09:17Z) - ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming [64.86326523181553]
ALERT is a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy.
It aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models.
arXiv Detail & Related papers (2024-04-06T15:01:47Z) - An Insight into Security Code Review with LLMs: Capabilities, Obstacles and Influential Factors [9.309745288471374]
Security code review is a time-consuming and labor-intensive process.
Existing security analysis tools struggle with poor generalization, high false positive rates, and coarse detection granularity.
Large Language Models (LLMs) have been considered promising candidates for addressing those challenges.
arXiv Detail & Related papers (2024-01-29T17:13:44Z) - The Art of Defending: A Systematic Evaluation and Analysis of LLM
Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications.
This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z) - Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes.
To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.