LongSafety: Evaluating Long-Context Safety of Large Language Models
- URL: http://arxiv.org/abs/2502.16971v1
- Date: Mon, 24 Feb 2025 08:54:39 GMT
- Title: LongSafety: Evaluating Long-Context Safety of Large Language Models
- Authors: Yida Lu, Jiale Cheng, Zhexin Zhang, Shiyao Cui, Cunxiang Wang, Xiaotao Gu, Yuxiao Dong, Jie Tang, Hongning Wang, Minlie Huang,
- Abstract summary: LongSafety is the first benchmark designed to evaluate safety in open-ended long-context tasks.<n>Our evaluation reveals significant safety vulnerabilities, with most models achieving safety rates below 55%.<n>Our findings emphasize the unique challenges and urgency of improving long-context safety.
- Score: 95.2469116388522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Large Language Models (LLMs) continue to advance in understanding and generating long sequences, new safety concerns have been introduced through the long context. However, the safety of LLMs in long-context tasks remains under-explored, leaving a significant gap in both evaluation and improvement of their safety. To address this, we introduce LongSafety, the first comprehensive benchmark specifically designed to evaluate LLM safety in open-ended long-context tasks. LongSafety encompasses 7 categories of safety issues and 6 user-oriented long-context tasks, with a total of 1,543 test cases, averaging 5,424 words per context. Our evaluation towards 16 representative LLMs reveals significant safety vulnerabilities, with most models achieving safety rates below 55%. Our findings also indicate that strong safety performance in short-context scenarios does not necessarily correlate with safety in long-context tasks, emphasizing the unique challenges and urgency of improving long-context safety. Moreover, through extensive analysis, we identify challenging safety issues and task types for long-context models. Furthermore, we find that relevant context and extended input sequences can exacerbate safety risks in long-context scenarios, highlighting the critical need for ongoing attention to long-context safety challenges. Our code and data are available at https://github.com/thu-coai/LongSafety.
Related papers
- LongSafetyBench: Long-Context LLMs Struggle with Safety Issues [83.75506157823517]
Long-context language models can exhibit safety concerns.
The proportion of safe responses from most mainstream long-context models is below 50%.
LongSafetyBench can serve as a valuable benchmark for evaluating the safety capabilities of long-context language models.
arXiv Detail & Related papers (2024-11-11T11:57:37Z) - Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.
For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.
We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z) - Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Model [73.8765529028288]
We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment.
To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations.
Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.
arXiv Detail & Related papers (2024-06-21T16:14:15Z) - The Art of Defending: A Systematic Evaluation and Analysis of LLM
Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications.
This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z) - Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an In-Context Attack [20.551730528019338]
Are mainstream NLP tasks adequately aligned with safety consideration?
Our study reveals significant disparities in the safety alignment of various NLP tasks.
Attacks exploiting tasks with weaker safety alignment, like summarization, can potentially compromise the integrity of tasks traditionally deemed more robust.
arXiv Detail & Related papers (2023-12-12T01:39:29Z) - SafetyBench: Evaluating the Safety of Large Language Models [54.878612385780805]
SafetyBench is a comprehensive benchmark for evaluating the safety of Large Language Models (LLMs)
It comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns.
Our tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot settings reveal a substantial performance advantage for GPT-4 over its counterparts.
arXiv Detail & Related papers (2023-09-13T15:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.