Automatic Instantiation of Assurance Cases from Patterns Using Large Language Models
- URL: http://arxiv.org/abs/2410.05488v1
- Date: Mon, 7 Oct 2024 20:58:29 GMT
- Title: Automatic Instantiation of Assurance Cases from Patterns Using Large Language Models
- Authors: Oluwafemi Odu, Alvine B. Belle, Song Wang, Segla Kpodjedo, Timothy C. Lethbridge, Hadi Hemmati,
- Abstract summary: Large Language Models (LLMs) can generate assurance cases that comply with specific patterns.
LLMs exhibit potential in the automatic generation of assurance cases, but their capabilities still fall short compared to human experts.
- Score: 6.314768437420443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An assurance case is a structured set of arguments supported by evidence, demonstrating that a system's non-functional requirements (e.g., safety, security, reliability) have been correctly implemented. Assurance case patterns serve as templates derived from previous successful assurance cases, aimed at facilitating the creation of new assurance cases. Despite the use of these patterns to generate assurance cases, their instantiation remains a largely manual and error-prone process that heavily relies on domain expertise. Thus, exploring techniques to support their automatic instantiation becomes crucial. This study aims to investigate the potential of Large Language Models (LLMs) in automating the generation of assurance cases that comply with specific patterns. Specifically, we formalize assurance case patterns using predicate-based rules and then utilize LLMs, i.e., GPT-4o and GPT-4 Turbo, to automatically instantiate assurance cases from these formalized patterns. Our findings suggest that LLMs can generate assurance cases that comply with the given patterns. However, this study also highlights that LLMs may struggle with understanding some nuances related to pattern-specific relationships. While LLMs exhibit potential in the automatic generation of assurance cases, their capabilities still fall short compared to human experts. Therefore, a semi-automatic approach to instantiating assurance cases may be more practical at this time.
Related papers
- Automating Semantic Analysis of System Assurance Cases using Goal-directed ASP [1.2189422792863451]
We present our approach to enhancing Assurance 2.0 with semantic rule-based analysis capabilities.
We examine the unique semantic aspects of assurance cases, such as logical consistency, adequacy, indefeasibility, etc.
arXiv Detail & Related papers (2024-08-21T15:22:43Z) - CoDefeater: Using LLMs To Find Defeaters in Assurance Cases [4.4398355848251745]
This paper proposes CoDefeater, an automated process to leverage large language models (LLMs) for finding defeaters.
Initial results on two systems show that LLMs can efficiently find known and unforeseen feasible defeaters to support safety analysts.
arXiv Detail & Related papers (2024-07-18T17:16:35Z) - A PRISMA-Driven Bibliometric Analysis of the Scientific Literature on Assurance Case Patterns [7.930875992631788]
Assurance cases can be used to prevent system failure.
They are structured arguments that allow arguing and relaying various safety-critical systems' requirements.
arXiv Detail & Related papers (2024-07-06T05:00:49Z) - AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses.
Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies.
We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Automated Control Logic Test Case Generation using Large Language Models [13.273872261029608]
We propose a novel approach for the automatic generation of PLC test cases that queries a Large Language Model (LLM)
Experiments with ten open-source function blocks from the OSCAT automation library showed that the approach is fast, easy to use, and can yield test cases with high statement coverage for low-to-medium complex programs.
arXiv Detail & Related papers (2024-05-03T06:09:21Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models [54.95912006700379]
We introduce AutoDAN, a novel jailbreak attack against aligned Large Language Models.
AutoDAN can automatically generate stealthy jailbreak prompts by the carefully designed hierarchical genetic algorithm.
arXiv Detail & Related papers (2023-10-03T19:44:37Z) - Trusta: Reasoning about Assurance Cases with Formal Methods and Large
Language Models [4.005483185111992]
Trustworthiness Derivation Tree Analyzer (Trusta) is a desktop application designed to automatically construct and verify TDTs.
It has a built-in Prolog interpreter in its backend, and is supported by the constraint solvers Z3 and MONA.
Trusta can extract formal constraints from text in natural languages, facilitating an easier interpretation and validation process.
arXiv Detail & Related papers (2023-09-22T15:42:43Z) - Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models.
We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z) - Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences.
Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary.
In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.