Evaluating Implicit Regulatory Compliance in LLM Tool Invocation via Logic-Guided Synthesis
- URL: http://arxiv.org/abs/2601.08196v1
- Date: Tue, 13 Jan 2026 03:55:18 GMT
- Title: Evaluating Implicit Regulatory Compliance in LLM Tool Invocation via Logic-Guided Synthesis
- Authors: Da Song, Yuheng Huang, Boqi Chen, Tianshuo Cong, Randy Goebel, Lei Ma, Foutse Khomh,
- Abstract summary: We introduce LogiSafetyGen, a framework that converts unstructured regulations into Linear Temporal Logic oracles and employs logic-guided fuzzing to synthesize valid, safety-critical traces.<n>Building on this framework, we construct LogiSafetyBench, a benchmark comprising 240 human-verified tasks that require LLMs to generate Python programs that satisfy both functional objectives and latent compliance rules.<n> Evaluations of 13 state-of-the-art (SOTA) LLMs reveal that larger models, despite achieving better functional correctness, frequently prioritize task completion over safety, which results in non-compliant behavior.
- Score: 18.51135049856393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of large language models (LLMs) into autonomous agents has enabled complex tool use, yet in high-stakes domains, these systems must strictly adhere to regulatory standards beyond simple functional correctness. However, existing benchmarks often overlook implicit regulatory compliance, thus failing to evaluate whether LLMs can autonomously enforce mandatory safety constraints. To fill this gap, we introduce LogiSafetyGen, a framework that converts unstructured regulations into Linear Temporal Logic oracles and employs logic-guided fuzzing to synthesize valid, safety-critical traces. Building on this framework, we construct LogiSafetyBench, a benchmark comprising 240 human-verified tasks that require LLMs to generate Python programs that satisfy both functional objectives and latent compliance rules. Evaluations of 13 state-of-the-art (SOTA) LLMs reveal that larger models, despite achieving better functional correctness, frequently prioritize task completion over safety, which results in non-compliant behavior.
Related papers
- Improving LLM Reliability through Hybrid Abstention and Adaptive Detection [1.9495934446083012]
Large Language Models (LLMs) deployed in production environments face a fundamental safety-utility trade-off.<n>Conventional guardrails based on static rules or fixed confidence thresholds are typically context-insensitive and computationally expensive.<n>We introduce an adaptive abstention system that dynamically adjusts safety thresholds based on real-time contextual signals.
arXiv Detail & Related papers (2026-02-17T07:00:09Z) - RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories [58.32028251925354]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area.<n>We introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories.
arXiv Detail & Related papers (2026-01-30T08:29:01Z) - Towards Verifiably Safe Tool Use for LLM Agents [53.55621104327779]
Large language model (LLM)-based AI agents extend capabilities by enabling access to tools such as data sources, APIs, search engines, code sandboxes, and even other agents.<n>LLMs may invoke unintended tool interactions and introduce risks, such as leaking sensitive data or overwriting critical records.<n>Current approaches to mitigate these risks, such as model-based safeguards, enhance agents' reliability but cannot guarantee system safety.
arXiv Detail & Related papers (2026-01-12T21:31:38Z) - Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety [59.01189713115365]
We evaluate the impact of explicitly specifying extensive safety codes versus demonstrating them through illustrative cases.<n>We find that referencing explicit codes inconsistently improves harmlessness and systematically degrades helpfulness.<n>We propose CADA, a case-augmented deliberative alignment method for LLMs utilizing reinforcement learning on self-generated safety reasoning chains.
arXiv Detail & Related papers (2026-01-12T21:08:46Z) - Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking [64.97768177044355]
Large Language Models (LLMs) are increasingly deployed in real-world fact-checking systems.<n>We present FactArena, a fully automated arena-style evaluation framework.<n>Our analyses reveal significant discrepancies between static claim-verification accuracy and end-to-end fact-checking competence.
arXiv Detail & Related papers (2026-01-06T02:51:56Z) - Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models [3.710103086278309]
Large Language Models (LLMs) are typically aligned for safety during the post-training phase.<n>They may still generate inappropriate outputs that could potentially pose risks to users.<n>This challenge underscores the need for robust safeguards that operate across both model inputs and outputs.
arXiv Detail & Related papers (2025-12-05T00:43:55Z) - SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents [25.567593463613388]
We present Sentinel, the first framework for formally evaluating the physical safety of Large Language Model(LLM)-based embodied agents.<n>We apply Sentinel in VirtualHome and ALFRED, and formally evaluate multiple LLM-based embodied agents against diverse safety requirements.
arXiv Detail & Related papers (2025-10-14T20:53:51Z) - Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance [49.50518009960314]
Existing safety methods rely on ad-hoc taxonomy and lack a rigorous, systematic protection.<n>We develop a new benchmark for safety compliance by generating realistic LLM safety scenarios seeded with legal statutes.<n>Our experiments demonstrate that the Compliance Reasoner achieves superior performance on the new benchmark.
arXiv Detail & Related papers (2025-09-26T12:11:29Z) - Evaluating LLM Agent Adherence to Hierarchical Safety Principles: A Lightweight Benchmark for Probing Foundational Controllability Components [0.0]
This paper introduces a lightweight, interpretable benchmark to evaluate an agent's ability to uphold a high-level safety principle.<n>Our evaluation reveals two primary findings: (1) a quantifiable "cost of compliance" where safety constraints degrade task performance even when compliant solutions exist, and (2) an "illusion of compliance" where high adherence often masks task incompetence rather than choice.
arXiv Detail & Related papers (2025-06-03T01:16:34Z) - SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning [2.1331883629523634]
SagaLLM is a structured multi-agent architecture designed to address four foundational limitations of current LLM-based planning systems.<n>It bridges this gap by integrating the Saga transactional pattern with persistent memory, automated compensation, and independent validation agents.<n>It achieves significant improvements in consistency, validation accuracy, and adaptive coordination under uncertainty.
arXiv Detail & Related papers (2025-03-15T01:43:03Z) - Graphormer-Guided Task Planning: Beyond Static Rules with LLM Safety Perception [4.424170214926035]
We propose a risk-aware task planning framework that combines large language models with structured safety modeling.<n>Our approach constructs a dynamic-semantic safety graph, capturing spatial and contextual risk factors.<n>Unlike existing methods that rely on predefined safety constraints, our framework introduces a context-aware risk perception module.
arXiv Detail & Related papers (2025-03-10T02:43:54Z) - SafeSwitch: Steering Unsafe LLM Behavior via Internal Activation Signals [51.49737867797442]
Large language models (LLMs) exhibit exceptional capabilities across various tasks but also pose risks by generating harmful content.<n>We show that LLMs can similarly perform internal assessments about safety in their internal states.<n>We propose SafeSwitch, a framework that regulates unsafe outputs by utilizing the prober-based internal state monitor.
arXiv Detail & Related papers (2025-02-03T04:23:33Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.