Related papers: Safety in Large Reasoning Models: A Survey

Safety in Large Reasoning Models: A Survey

URL: http://arxiv.org/abs/2504.17704v1
Date: Thu, 24 Apr 2025 16:11:01 GMT
Title: Safety in Large Reasoning Models: A Survey
Authors: Cheng Wang, Yue Liu, Baolong Li, Duzhen Zhang, Zhongzhi Li, Junfeng Fang,
Abstract summary: Large Reasoning Models (LRMs) have exhibited extraordinary prowess in tasks like mathematics and coding, leveraging their advanced reasoning capabilities.<n>This paper presents a comprehensive survey of LRMs, meticulously exploring and summarizing the newly emerged safety risks, attacks, and defense strategies.
Score: 15.148492389864133
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Reasoning Models (LRMs) have exhibited extraordinary prowess in tasks like mathematics and coding, leveraging their advanced reasoning capabilities. Nevertheless, as these capabilities progress, significant concerns regarding their vulnerabilities and safety have arisen, which can pose challenges to their deployment and application in real-world settings. This paper presents a comprehensive survey of LRMs, meticulously exploring and summarizing the newly emerged safety risks, attacks, and defense strategies. By organizing these elements into a detailed taxonomy, this work aims to offer a clear and structured understanding of the current safety landscape of LRMs, facilitating future research and development to enhance the security and reliability of these powerful models.

Related papers

Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models [69.22690439422531]
Diffusion models (DMs) have been investigated in various domains due to their ability to generate high-quality data.<n>Similar to traditional deep learning systems, there also exist potential threats to DMs.<n>This survey comprehensively elucidates its framework, threats, and countermeasures.
arXiv Detail & Related papers (2025-09-25T02:51:43Z)
AURA: Affordance-Understanding and Risk-aware Alignment Technique for Large Language Models [6.059681491089391]
AURA provides comprehensive, step level evaluations across logical coherence and safety-awareness.<n>Our framework seamlessly combines introspective self-critique, fine-grained PRM assessments, and adaptive safety-aware decoding.<n>This research represents a pivotal step toward safer, more responsible, and contextually aware AI, setting a new benchmark for alignment-sensitive applications.
arXiv Detail & Related papers (2025-08-08T08:43:24Z)
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations [127.52707312573791]
This survey provides a comprehensive analysis of LVLM safety, covering key aspects such as attacks, defenses, and evaluation methods. We introduce a unified framework that integrates these interrelated components, offering a holistic perspective on the vulnerabilities of LVLMs. We conduct a set of safety evaluations on the latest LVLM, Deepseek Janus-Pro, and provide a theoretical analysis of the results.
arXiv Detail & Related papers (2025-02-14T08:42:43Z)
Safety at Scale: A Comprehensive Survey of Large Model Safety [298.05093528230753]
We present a comprehensive taxonomy of safety threats to large models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats.<n>We identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices.
arXiv Detail & Related papers (2025-02-02T05:14:22Z)
Building Trust: Foundations of Security, Safety and Transparency in AI [0.23301643766310373]
We review the current security and safety scenarios while highlighting challenges such as tracking issues, remediation, and the apparent absence of AI model lifecycle and ownership processes. This paper aims to provide some of the foundational pieces for more standardized security, safety, and transparency in the development and operation of AI models.
arXiv Detail & Related papers (2024-11-19T06:55:57Z)
SoK: Unifying Cybersecurity and Cybersafety of Multimodal Foundation Models with an Information Theory Approach [58.93030774141753]
Multimodal foundation models (MFMs) represent a significant advancement in artificial intelligence. This paper conceptualizes cybersafety and cybersecurity in the context of multimodal learning. We present a comprehensive Systematization of Knowledge (SoK) to unify these concepts in MFMs, identifying key threats.
arXiv Detail & Related papers (2024-11-17T23:06:20Z)
Recent Advances in Attack and Defense Approaches of Large Language Models [27.271665614205034]
Large Language Models (LLMs) have revolutionized artificial intelligence and machine learning through their advanced text processing and generating capabilities.<n>Their widespread deployment has raised significant safety and reliability concerns.<n>This paper reviews current research on LLM vulnerabilities and threats, and evaluates the effectiveness of contemporary defense mechanisms.
arXiv Detail & Related papers (2024-09-05T06:31:37Z)
EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z)
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices [4.927763944523323]
Large language models (LLMs) have significantly transformed the landscape of Natural Language Processing (NLP) This research paper thoroughly investigates security and privacy concerns related to LLMs from five thematic perspectives. The paper recommends promising avenues for future research to enhance the security and risk management of LLMs.
arXiv Detail & Related papers (2024-03-19T07:10:58Z)
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety. This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z)
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models. We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models. We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.