LLM-based Property-based Test Generation for Guardrailing Cyber-Physical Systems
- URL: http://arxiv.org/abs/2505.23549v2
- Date: Fri, 13 Jun 2025 09:56:36 GMT
- Title: LLM-based Property-based Test Generation for Guardrailing Cyber-Physical Systems
- Authors: Khashayar Etemadi, Marjan Sirjani, Mahshid Helali Moghadam, Per Strandberg, Paul Pettersson,
- Abstract summary: Cyber-physical systems (CPSs) are complex systems that integrate physical, computational, and communication subsystems.<n>We propose an automated approach for guardrailing CPSs using property-based tests (PBTs) generated by Large Language Models (LLMs)
- Score: 4.399669126285083
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Cyber-physical systems (CPSs) are complex systems that integrate physical, computational, and communication subsystems. The heterogeneous nature of these systems makes their safety assurance challenging. In this paper, we propose a novel automated approach for guardrailing cyber-physical systems using property-based tests (PBTs) generated by Large Language Models (LLMs). Our approach employs an LLM to extract properties from the code and documentation of CPSs. Next, we use the LLM to generate PBTs that verify the extracted properties on the CPS. The generated PBTs have two uses. First, they are used to test the CPS before it is deployed, i.e., at design time. Secondly, these PBTs can be used after deployment, i.e., at run time, to monitor the behavior of the system and guardrail it against unsafe states. We implement our approach in ChekProp and conduct preliminary experiments to evaluate the generated PBTs in terms of their relevance (how well they match manually crafted properties), executability (how many run with minimal manual modification), and effectiveness (coverage of the input space partitions). The results of our experiments and evaluation demonstrate a promising path forward for creating guardrails for CPSs using LLM-generated property-based tests.
Related papers
- Vulnerability Mitigation System (VMS): LLM Agent and Evaluation Framework for Autonomous Penetration Testing [0.0]
We propose a Vulnerability Mitigation System (VMS) capable of performing penetration testing without human intervention.<n>The VMS has a two-part architecture for planning and a Summarizer, which enable it to generate commands and process feedback.<n>To standardize testing, we designed two new Capture the Flag benchmarks based on the PicoCTF and OverTheWire platforms.
arXiv Detail & Related papers (2025-07-14T06:19:17Z) - Private GPTs for LLM-driven testing in software development and machine learning [0.0]
We examine the capability of private GPTs to automatically generate executable test code based on requirements.<n>We use acceptance criteria as input, formulated as part of epics, or stories, which are typically used in modern development processes.
arXiv Detail & Related papers (2025-06-06T20:05:41Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - Defending against Indirect Prompt Injection by Instruction Detection [81.98614607987793]
We propose a novel approach that takes external data as input and leverages the behavioral state of LLMs during both forward and backward propagation to detect potential IPI attacks.<n>Our approach achieves a detection accuracy of 99.60% in the in-domain setting and 96.90% in the out-of-domain setting, while reducing the attack success rate to just 0.12% on the BIPIA benchmark.
arXiv Detail & Related papers (2025-05-08T13:04:45Z) - Federated Learning for Cyber Physical Systems: A Comprehensive Survey [49.54239703000928]
Federated learning (FL) has become increasingly popular in recent years.<n>The article scrutinizes how FL is utilized in critical CPS applications, e.g., intelligent transportation systems, cybersecurity services, smart cities, and smart healthcare solutions.
arXiv Detail & Related papers (2025-05-08T01:17:15Z) - AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - Get my drift? Catching LLM Task Drift with Activation Deltas [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.<n>We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.<n>We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - Large Language Models to Generate System-Level Test Programs Targeting Non-functional Properties [3.3305233186101226]
This paper proposes Large Language Models (LLMs) to generate test programs.
We take a first glance at how pre-trained LLMs perform in test program generation to optimize non-functional properties of the DUT.
arXiv Detail & Related papers (2024-03-15T08:01:02Z) - Test Case Generation and Test Oracle Support for Testing CPSs using
Hybrid Models [2.6166087473624313]
Cyber-Physical Systems (CPSs) play a central role in the behavior of a wide range of autonomous physical systems.
CPSs are often specified iteratively as a sequence of models at different levels that can be tested via simulation systems.
One such model is a hybrid automaton; these are used frequently for CPS applications and have the advantage of encapsulating both continuous and discrete CPS behaviors.
arXiv Detail & Related papers (2023-09-14T19:08:09Z) - Can Large Language Models Write Good Property-Based Tests? [5.671039991090038]
Property-based testing (PBT) is still relatively underused in real-world software.
We investigate using modern language models to automatically synthesize PBTs using two prompting techniques.
We find that with the best model and prompting approach, a valid and sound PBT can be synthesised in 2.4 samples on average.
arXiv Detail & Related papers (2023-07-10T05:09:33Z) - Stress Testing Control Loops in Cyber-Physical Systems [2.195923771201972]
We investigate the testing of control-based CPSs, where control and software engineers develop the software collaboratively.
We define stress testing of control-based CPSs as generating tests to falsify such design assumptions.
We evaluate our approach on three case study systems, including a drone, a continuous-current motor, and an aircraft.
arXiv Detail & Related papers (2023-02-27T16:01:38Z) - Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks.
It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned.
We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z) - Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.