Related papers: BarrierBench : Evaluating Large Language Models for Safety Verification in Dynamical Systems

BarrierBench : Evaluating Large Language Models for Safety Verification in Dynamical Systems

URL: http://arxiv.org/abs/2511.09363v1
Date: Thu, 13 Nov 2025 01:49:09 GMT
Title: BarrierBench : Evaluating Large Language Models for Safety Verification in Dynamical Systems
Authors: Ali Taheri, Alireza Taban, Sadegh Soudjani, Ashutosh Trivedi,
Abstract summary: We introduce an LLM-based agentic framework for barrier certificate synthesis.<n>The framework uses natural language reasoning to propose, refine, and validate candidate certificates.<n> BarrierBench is a benchmark of 100 dynamical systems spanning linear, nonlinear, discrete-time, and continuous-time settings.
Score: 4.530582224312311
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Safety verification of dynamical systems via barrier certificates is essential for ensuring correctness in autonomous applications. Synthesizing these certificates involves discovering mathematical functions with current methods suffering from poor scalability, dependence on carefully designed templates, and exhaustive or incremental function-space searches. They also demand substantial manual expertise--selecting templates, solvers, and hyperparameters, and designing sampling strategies--requiring both theoretical and practical knowledge traditionally shared through linguistic reasoning rather than formalized methods. This motivates a key question: can such expert reasoning be captured and operationalized by language models? We address this by introducing an LLM-based agentic framework for barrier certificate synthesis. The framework uses natural language reasoning to propose, refine, and validate candidate certificates, integrating LLM-driven template discovery with SMT-based verification, and supporting barrier-controller co-synthesis to ensure consistency between safety certificates and controllers. To evaluate this capability, we introduce BarrierBench, a benchmark of 100 dynamical systems spanning linear, nonlinear, discrete-time, and continuous-time settings. Our experiments assess not only the effectiveness of LLM-guided barrier synthesis but also the utility of retrieval-augmented generation and agentic coordination strategies in improving its reliability and performance. Across these tasks, the framework achieves more than 90% success in generating valid certificates. By releasing BarrierBench and the accompanying toolchain, we aim to establish a community testbed for advancing the integration of language-based reasoning with formal verification in dynamical systems. The benchmark is publicly available at https://hycodev.com/dataset/barrierbench

Related papers

Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval [60.25608870901428]
Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs)<n>We propose the task of fact-checking without retrieval, focusing on the verification of arbitrary natural language claims, independent of their source robustness.
arXiv Detail & Related papers (2026-03-05T18:42:51Z)
LUCID: Learning-Enabled Uncertainty-Aware Certification of Stochastic Dynamical Systems [0.8574682463936006]
Traditional formal verification tools fall short when faced with systems that embed both opaque black-box AI components and complex dynamics.<n>We introduce LUCID, a verification engine for certifying safety of black-box embedding dynamical systems.<n> LUCID is the first known tool capable of establishing quantified safety guarantees for such systems.
arXiv Detail & Related papers (2025-12-12T17:46:50Z)
Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z)
Position Paper: Programming Language Techniques for Bridging LLM Code Generation Semantic Gaps [3.61356888205659]
This paper argues that principled integration of Programming Language techniques is essential for bridging semantic gaps in large language models.<n>PL techniques can elevate LLM-generated code from statistical pattern matching to truly reliable and trustworthy levels.
arXiv Detail & Related papers (2025-07-12T04:32:15Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
Vulnerability Detection: From Formal Verification to Large Language Models and Hybrid Approaches: A Comprehensive Overview [3.135279672650891]
This paper focuses on state-of-the-art software testing and verification.<n>It focuses on three key approaches: classical formal methods, LLM-based analysis, and emerging hybrid techniques.<n>We analyze whether integrating formal rigor with LLM-driven insights can enhance the effectiveness and scalability of software verification.
arXiv Detail & Related papers (2025-03-13T18:22:22Z)
CELA: Cost-Efficient Language Model Alignment for CTR Prediction [70.65910069412944]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems.<n>Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs)<n>We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z)
Enhancing Security in Federated Learning through Adaptive Consensus-Based Model Update Validation [2.28438857884398]
This paper introduces an advanced approach for fortifying Federated Learning (FL) systems against label-flipping attacks. We propose a consensus-based verification process integrated with an adaptive thresholding mechanism. Our results indicate a significant mitigation of label-flipping attacks, bolstering the FL system's resilience.
arXiv Detail & Related papers (2024-03-05T20:54:56Z)
A General Framework for Verification and Control of Dynamical Models via Certificate Synthesis [54.959571890098786]
We provide a framework to encode system specifications and define corresponding certificates. We present an automated approach to formally synthesise controllers and certificates. Our approach contributes to the broad field of safe learning for control, exploiting the flexibility of neural networks.
arXiv Detail & Related papers (2023-09-12T09:37:26Z)
Joint Differentiable Optimization and Verification for Certified Reinforcement Learning [91.93635157885055]
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties. We propose a framework that jointly conducts reinforcement learning and formal verification.
arXiv Detail & Related papers (2022-01-28T16:53:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.