Related papers: Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance

Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance

URL: http://arxiv.org/abs/2509.22250v1
Date: Fri, 26 Sep 2025 12:11:29 GMT
Title: Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance
Authors: Wenbin Hu, Huihao Jing, Haochen Shi, Haoran Li, Yangqiu Song,
Abstract summary: Existing safety methods rely on ad-hoc taxonomy and lack a rigorous, systematic protection.<n>We develop a new benchmark for safety compliance by generating realistic LLM safety scenarios seeded with legal statutes.<n>Our experiments demonstrate that the Compliance Reasoner achieves superior performance on the new benchmark.
Score: 49.50518009960314
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The proliferation of Large Language Models (LLMs) has demonstrated remarkable capabilities, elevating the critical importance of LLM safety. However, existing safety methods rely on ad-hoc taxonomy and lack a rigorous, systematic protection, failing to ensure safety for the nuanced and complex behaviors of modern LLM systems. To address this problem, we solve LLM safety from legal compliance perspectives, named safety compliance. In this work, we posit relevant established legal frameworks as safety standards for defining and measuring safety compliance, including the EU AI Act and GDPR, which serve as core legal frameworks for AI safety and data security in Europe. To bridge the gap between LLM safety and legal compliance, we first develop a new benchmark for safety compliance by generating realistic LLM safety scenarios seeded with legal statutes. Subsequently, we align Qwen3-8B using Group Policy Optimization (GRPO) to construct a safety reasoner, Compliance Reasoner, which effectively aligns LLMs with legal standards to mitigate safety risks. Our comprehensive experiments demonstrate that the Compliance Reasoner achieves superior performance on the new benchmark, with average improvements of +10.45% for the EU AI Act and +11.85% for GDPR.

Related papers

Inference-Time Safety For Code LLMs Via Retrieval-Augmented Revision [3.983997834693767]
Large Language Models (LLMs) are increasingly deployed for code generation in high-stakes software development.<n>LLMs cannot readily adapt to newly discovered vulnerabilities or changing security standards without retraining.<n>We present a principled approach to trustworthy code generation by design that operates as an inference-time safety mechanism.
arXiv Detail & Related papers (2026-03-02T06:06:34Z)
Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety [59.01189713115365]
We evaluate the impact of explicitly specifying extensive safety codes versus demonstrating them through illustrative cases.<n>We find that referencing explicit codes inconsistently improves harmlessness and systematically degrades helpfulness.<n>We propose CADA, a case-augmented deliberative alignment method for LLMs utilizing reinforcement learning on self-generated safety reasoning chains.
arXiv Detail & Related papers (2026-01-12T21:08:46Z)
Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications [0.0]
This paper introduces a practical framework for evaluating application-level safety in large language models (LLMs)<n>We illustrate how the proposed framework was applied in our internal pilot, providing a reference point for organizations seeking to scale their safety testing efforts.
arXiv Detail & Related papers (2025-07-13T22:34:20Z)
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards [55.76285458905577]
Large Language Models (LLMs) continue to exhibit vulnerabilities despite deliberate safety alignment efforts.<n>To safeguard against the risk of policy-violating content, system-level moderation via external guard models has emerged as a prevalent mitigation strategy.<n>We propose RSafe, an adaptive reasoning-based safeguard that conducts guided safety reasoning to provide robust protection within the scope of specified safety policies.
arXiv Detail & Related papers (2025-06-09T13:20:04Z)
Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning [53.92712851223158]
We formulate safety and privacy issues into contextualized compliance problems following the Contextual Integrity (CI) theory.<n>Under the CI framework, we align our model with three critical regulatory standards: EU AI Act, and HIPAA.<n>We employ reinforcement learning (RL) with a rule-based reward to incentivize contextual reasoning capabilities while enhancing compliance with safety and privacy norms.
arXiv Detail & Related papers (2025-05-20T16:40:09Z)
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents [13.225168384790257]
Large Language Models (LLMs) exhibit substantial promise in enhancing task-planning capabilities within embodied agents.<n>We present Safe-BeAl, an integrated framework for the measurement (SafePlan-Bench) and alignment (Safe-Align) of LLM-based embodied agents' behaviors.<n>Our empirical analysis reveals that even in the absence of adversarial inputs or malicious intent, LLM-based agents can exhibit unsafe behaviors.
arXiv Detail & Related papers (2025-04-20T15:12:14Z)
On Almost Surely Safe Alignment of Large Language Models at Inference-Time [20.5164976103514]
We introduce a novel inference-time alignment approach for LLMs that aims to generate safe responses almost surely.<n>We augment a safety state that tracks the evolution of safety constraints and dynamically penalizes unsafe generations.<n>We demonstrate formal safety guarantees w.r.t. the given cost model upon solving the MDP in the latent space with sufficiently large penalties.
arXiv Detail & Related papers (2025-02-03T09:59:32Z)
Global Challenge for Safe and Secure LLMs Track 1 [57.08717321907755]
The Global Challenge for Safe and Secure Large Language Models (LLMs) is a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks.
arXiv Detail & Related papers (2024-11-21T08:20:31Z)
Towards Assuring EU AI Act Compliance and Adversarial Robustness of LLMs [1.368472250332885]
Large language models are prone to misuse and vulnerable to security threats. The European Union's Artificial Intelligence Act seeks to enforce AI robustness in certain contexts.
arXiv Detail & Related papers (2024-10-04T18:38:49Z)
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models [46.148439517272024]
Generative large language models (LLMs) have revolutionized natural language processing with their transformative and emergent capabilities.<n>Recent evidence indicates that LLMs can produce harmful content that violates social norms.<n>We propose S-Eval, an automated Safety Evaluation framework with a newly defined comprehensive risk taxonomy.
arXiv Detail & Related papers (2024-05-23T05:34:31Z)
Towards Comprehensive Post Safety Alignment of Large Language Models via Safety Patching [74.62818936088065]
textscSafePatching is a novel framework for comprehensive PSA.<n>textscSafePatching achieves a more comprehensive PSA than baseline methods.<n>textscSafePatching demonstrates its superiority in continual PSA scenarios.
arXiv Detail & Related papers (2024-05-22T16:51:07Z)
SafetyBench: Evaluating the Safety of Large Language Models [54.878612385780805]
SafetyBench is a comprehensive benchmark for evaluating the safety of Large Language Models (LLMs) It comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns. Our tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot settings reveal a substantial performance advantage for GPT-4 over its counterparts.
arXiv Detail & Related papers (2023-09-13T15:56:50Z)
Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes. To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.