Safety Analysis in the Era of Large Language Models: A Case Study of
STPA using ChatGPT
- URL: http://arxiv.org/abs/2304.01246v3
- Date: Wed, 20 Dec 2023 09:19:15 GMT
- Title: Safety Analysis in the Era of Large Language Models: A Case Study of
STPA using ChatGPT
- Authors: Yi Qi, Xingyu Zhao, Siddartha Khastgir, Xiaowei Huang
- Abstract summary: Using ChatGPT without human intervention may be inadequate due to reliability related issues, but with careful design, it may outperform human experts.
No statistically significant differences are found when varying the semantic complexity or using common prompt guidelines.
- Score: 11.27440170845105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Can safety analysis make use of Large Language Models (LLMs)? A case study
explores Systems Theoretic Process Analysis (STPA) applied to Automatic
Emergency Brake (AEB) and Electricity Demand Side Management (DSM) systems
using ChatGPT. We investigate how collaboration schemes, input semantic
complexity, and prompt guidelines influence STPA results. Comparative results
show that using ChatGPT without human intervention may be inadequate due to
reliability related issues, but with careful design, it may outperform human
experts. No statistically significant differences are found when varying the
input semantic complexity or using common prompt guidelines, which suggests the
necessity for developing domain-specific prompt engineering. We also highlight
future challenges, including concerns about LLM trustworthiness and the
necessity for standardisation and regulation in this domain.
Related papers
- Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.
Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.
We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z) - Computational Safety for Generative AI: A Signal Processing Perspective [65.268245109828]
computational safety is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI.
We show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts.
We discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.
arXiv Detail & Related papers (2025-02-18T02:26:50Z) - Beyond the Surface: An NLP-based Methodology to Automatically Estimate CVE Relevance for CAPEC Attack Patterns [42.63501759921809]
We propose a methodology leveraging Natural Language Processing (NLP) to associate Common Vulnerabilities and Exposure (CAPEC) vulnerabilities with Common Attack Patternion and Classification (CAPEC) attack patterns.
Experimental evaluations demonstrate superior performance compared to state-of-the-art models.
arXiv Detail & Related papers (2025-01-13T08:39:52Z) - Leveraging Conversational Generative AI for Anomaly Detection in Digital Substations [0.0]
The research employs advanced performance metrics to conduct a comparative assessment between the proposed AD and HITL-based AD frameworks.
This approach presents a promising solution for enhancing the reliability of power system operations in the face of evolving cybersecurity challenges.
arXiv Detail & Related papers (2024-11-09T18:38:35Z) - A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications [0.0]
Cybersecurity breaches in digital substations pose significant challenges to the stability and reliability of power system operations.
This paper proposes a task-oriented dialogue system for anomaly detection (AD) in datasets of multicast messages.
It has a lower potential error and better scalability and adaptability than a process that considers the cybersecurity guidelines recommended by humans.
arXiv Detail & Related papers (2024-06-08T13:28:50Z) - On STPA for Distributed Development of Safe Autonomous Driving: An Interview Study [0.7851536646859475]
System-Theoretic Process Analysis (STPA) is a novel method applied in safety-related fields like defense and aerospace.
STPA assumes prerequisites that are not fully valid in the automotive system engineering with distributed system development and multi-abstraction design levels.
This can be seen as a maintainability challenge in continuous development and deployment.
arXiv Detail & Related papers (2024-03-14T15:56:02Z) - Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality
Assurance [7.002143951776267]
The study delves into stability issues related to both the operation and robustness of the expansive AI model on which ChatGPT is based.
The results reveal that the constructed ChatGPT-based sentiment analysis system exhibits uncertainty, which is attributed to various operational factors.
arXiv Detail & Related papers (2024-01-15T03:00:39Z) - Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples.
One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports.
Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z) - Consistency Analysis of ChatGPT [65.268245109828]
This paper investigates the trustworthiness of ChatGPT and GPT-4 regarding logically consistent behaviour.
Our findings suggest that while both models appear to show an enhanced language understanding and reasoning ability, they still frequently fall short of generating logically consistent predictions.
arXiv Detail & Related papers (2023-03-11T01:19:01Z) - Understanding the Usability Challenges of Machine Learning In
High-Stakes Decision Making [67.72855777115772]
Machine learning (ML) is being applied to a diverse and ever-growing set of domains.
In many cases, domain experts -- who often have no expertise in ML or data science -- are asked to use ML predictions to make high-stakes decisions.
We investigate the ML usability challenges present in the domain of child welfare screening through a series of collaborations with child welfare screeners.
arXiv Detail & Related papers (2021-03-02T22:50:45Z) - Trojaning Language Models for Fun and Profit [53.45727748224679]
TROJAN-LM is a new class of trojaning attacks in which maliciously crafted LMs trigger host NLP systems to malfunction.
By empirically studying three state-of-the-art LMs in a range of security-critical NLP tasks, we demonstrate that TROJAN-LM possesses the following properties.
arXiv Detail & Related papers (2020-08-01T18:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.