Safety Analysis in the Era of Large Language Models: A Case Study of
STPA using ChatGPT
- URL: http://arxiv.org/abs/2304.01246v3
- Date: Wed, 20 Dec 2023 09:19:15 GMT
- Title: Safety Analysis in the Era of Large Language Models: A Case Study of
STPA using ChatGPT
- Authors: Yi Qi, Xingyu Zhao, Siddartha Khastgir, Xiaowei Huang
- Abstract summary: Using ChatGPT without human intervention may be inadequate due to reliability related issues, but with careful design, it may outperform human experts.
No statistically significant differences are found when varying the semantic complexity or using common prompt guidelines.
- Score: 11.27440170845105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Can safety analysis make use of Large Language Models (LLMs)? A case study
explores Systems Theoretic Process Analysis (STPA) applied to Automatic
Emergency Brake (AEB) and Electricity Demand Side Management (DSM) systems
using ChatGPT. We investigate how collaboration schemes, input semantic
complexity, and prompt guidelines influence STPA results. Comparative results
show that using ChatGPT without human intervention may be inadequate due to
reliability related issues, but with careful design, it may outperform human
experts. No statistically significant differences are found when varying the
input semantic complexity or using common prompt guidelines, which suggests the
necessity for developing domain-specific prompt engineering. We also highlight
future challenges, including concerns about LLM trustworthiness and the
necessity for standardisation and regulation in this domain.
Related papers
- A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications [0.0]
Cybersecurity breaches in digital substations pose significant challenges to the stability and reliability of power system operations.
This paper proposes a task-oriented dialogue system for anomaly detection (AD) in datasets of multicast messages.
It has a lower potential error and better scalability and adaptability than a process that considers the cybersecurity guidelines recommended by humans.
arXiv Detail & Related papers (2024-06-08T13:28:50Z) - The AI Companion in Education: Analyzing the Pedagogical Potential of ChatGPT in Computer Science and Engineering [1.120999712480549]
This study aims to comprehensively analyze the pedagogical potential of ChatGPT in CSE education.
We employ a systematic approach, creating a diverse range of educational practice problems within CSE field.
According to our examinations, certain question types, like conceptual knowledge queries, typically do not pose significant challenges to ChatGPT.
arXiv Detail & Related papers (2024-04-23T21:42:30Z) - On STPA for Distributed Development of Safe Autonomous Driving: An Interview Study [0.7851536646859475]
System-Theoretic Process Analysis (STPA) is a novel method applied in safety-related fields like defense and aerospace.
STPA assumes prerequisites that are not fully valid in the automotive system engineering with distributed system development and multi-abstraction design levels.
This can be seen as a maintainability challenge in continuous development and deployment.
arXiv Detail & Related papers (2024-03-14T15:56:02Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality
Assurance [7.002143951776267]
The study delves into stability issues related to both the operation and robustness of the expansive AI model on which ChatGPT is based.
The results reveal that the constructed ChatGPT-based sentiment analysis system exhibits uncertainty, which is attributed to various operational factors.
arXiv Detail & Related papers (2024-01-15T03:00:39Z) - Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples.
One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports.
Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z) - Trusta: Reasoning about Assurance Cases with Formal Methods and Large
Language Models [4.005483185111992]
Trustworthiness Derivation Tree Analyzer (Trusta) is a desktop application designed to automatically construct and verify TDTs.
It has a built-in Prolog interpreter in its backend, and is supported by the constraint solvers Z3 and MONA.
Trusta can extract formal constraints from text in natural languages, facilitating an easier interpretation and validation process.
arXiv Detail & Related papers (2023-09-22T15:42:43Z) - Large Language Models are Not Yet Human-Level Evaluators for Abstractive
Summarization [66.08074487429477]
We investigate the stability and reliability of large language models (LLMs) as automatic evaluators for abstractive summarization.
We find that while ChatGPT and GPT-4 outperform the commonly used automatic metrics, they are not ready as human replacements.
arXiv Detail & Related papers (2023-05-22T14:58:13Z) - Consistency Analysis of ChatGPT [65.268245109828]
This paper investigates the trustworthiness of ChatGPT and GPT-4 regarding logically consistent behaviour.
Our findings suggest that while both models appear to show an enhanced language understanding and reasoning ability, they still frequently fall short of generating logically consistent predictions.
arXiv Detail & Related papers (2023-03-11T01:19:01Z) - Understanding the Usability Challenges of Machine Learning In
High-Stakes Decision Making [67.72855777115772]
Machine learning (ML) is being applied to a diverse and ever-growing set of domains.
In many cases, domain experts -- who often have no expertise in ML or data science -- are asked to use ML predictions to make high-stakes decisions.
We investigate the ML usability challenges present in the domain of child welfare screening through a series of collaborations with child welfare screeners.
arXiv Detail & Related papers (2021-03-02T22:50:45Z) - Trojaning Language Models for Fun and Profit [53.45727748224679]
TROJAN-LM is a new class of trojaning attacks in which maliciously crafted LMs trigger host NLP systems to malfunction.
By empirically studying three state-of-the-art LMs in a range of security-critical NLP tasks, we demonstrate that TROJAN-LM possesses the following properties.
arXiv Detail & Related papers (2020-08-01T18:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.