Evaluating the Effectiveness of GPT-4 Turbo in Creating Defeaters for
Assurance Cases
- URL: http://arxiv.org/abs/2401.17991v1
- Date: Wed, 31 Jan 2024 16:51:23 GMT
- Title: Evaluating the Effectiveness of GPT-4 Turbo in Creating Defeaters for
Assurance Cases
- Authors: Kimya Khakzad Shahandashti, Mithila Sivakumar, Mohammad Mahdi Mohajer,
Alvine B. Belle, Song Wang, Timothy C. Lethbridge
- Abstract summary: We use GPT-4 Turbo, an advanced Large Language Model (LLM) developed by OpenAI, to identify defeaters within ACs formalized using the Eliminative Argumentation (EA) notation.
Our initial evaluation gauges the model's proficiency in understanding and generating arguments within this framework.
The findings indicate that GPT-4 Turbo excels in EA notation and is capable of generating various types of defeaters.
- Score: 6.231203956284574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Assurance cases (ACs) are structured arguments that support the verification
of the correct implementation of systems' non-functional requirements, such as
safety and security, thereby preventing system failures which could lead to
catastrophic outcomes, including loss of lives. ACs facilitate the
certification of systems in accordance with industrial standards, for example,
DO-178C and ISO 26262. Identifying defeaters arguments that refute these ACs is
essential for improving the robustness and confidence in ACs. To automate this
task, we introduce a novel method that leverages the capabilities of GPT-4
Turbo, an advanced Large Language Model (LLM) developed by OpenAI, to identify
defeaters within ACs formalized using the Eliminative Argumentation (EA)
notation. Our initial evaluation gauges the model's proficiency in
understanding and generating arguments within this framework. The findings
indicate that GPT-4 Turbo excels in EA notation and is capable of generating
various types of defeaters.
Related papers
- Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks [9.277492743469235]
We present the first systematic jailbreak evaluation of DeepSeek-series models.<n>We compare them with GPT-3.5 and GPT-4 using the HarmBench benchmark.
arXiv Detail & Related papers (2025-06-23T11:53:31Z) - T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks [67.91652526657599]
We formalize the T2V jailbreak attack as a discrete optimization problem and propose a joint objective-based optimization framework, called T2V-OptJail.<n>We conduct large-scale experiments on several T2V models, covering both open-source models and real commercial closed-source models.<n>The proposed method improves 11.4% and 10.0% over the existing state-of-the-art method in terms of attack success rate.
arXiv Detail & Related papers (2025-05-10T16:04:52Z) - AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.
We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.
Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z) - An Approach to Technical AGI Safety and Security [72.83728459135101]
We develop an approach to address the risk of harms consequential enough to significantly harm humanity.
We focus on technical approaches to misuse and misalignment.
We briefly outline how these ingredients could be combined to produce safety cases for AGI systems.
arXiv Detail & Related papers (2025-04-02T15:59:31Z) - SafetyAnalyst: Interpretable, transparent, and steerable safety moderation for AI behavior [56.10557932893919]
We present SafetyAnalyst, a novel AI safety moderation framework.
Given an AI behavior, SafetyAnalyst uses chain-of-thought reasoning to analyze its potential consequences.
It aggregates all harmful and beneficial effects into a harmfulness score using fully interpretable weight parameters.
arXiv Detail & Related papers (2024-10-22T03:38:37Z) - Automated Proof Generation for Rust Code via Self-Evolution [69.25795662658356]
We introduce SAFE, a novel framework that overcomes the lack of human-written proof to enable automated proof generation of Rust code.
We demonstrate superior efficiency and precision compared to GPT-4o.
This advancement leads to a significant improvement in performance, achieving a 70.50% accuracy rate in a benchmark crafted by human experts.
arXiv Detail & Related papers (2024-10-21T08:15:45Z) - GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering [0.0]
Retrieval-Augmented Generation (RAG) has emerged as a common paradigm to use Large Language Models (LLMs) alongside private and up-to-date knowledge bases.
We address the challenges of using LLM-as-a-Judge when evaluating grounded answers generated by RAG systems.
arXiv Detail & Related papers (2024-09-10T15:39:32Z) - AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies [80.90138009539004]
AIR-Bench 2024 is the first AI safety benchmark aligned with emerging government regulations and company policies.
It decomposes 8 government regulations and 16 company policies into a four-tiered safety taxonomy with granular risk categories in the lowest tier.
We evaluate leading language models on AIR-Bench 2024, uncovering insights into their alignment with specified safety concerns.
arXiv Detail & Related papers (2024-07-11T21:16:48Z) - Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation [86.05704141217036]
Black-box finetuning is an emerging interface for adapting state-of-the-art language models to user needs.
We introduce covert malicious finetuning, a method to compromise model safety via finetuning while evading detection.
arXiv Detail & Related papers (2024-06-28T17:05:46Z) - Jailbreaking as a Reward Misspecification Problem [80.52431374743998]
We propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process.
We introduce a metric ReGap to quantify the extent of reward misspecification and demonstrate its effectiveness.
We present ReMiss, a system for automated red teaming that generates adversarial prompts in a reward-misspecified space.
arXiv Detail & Related papers (2024-06-20T15:12:27Z) - PVF (Parameter Vulnerability Factor): A Scalable Metric for Understanding AI Vulnerability Against SDCs in Model Parameters [7.652441604508354]
Vulnerability Factor (PVF) is a metric aiming to standardize the quantification of AI model vulnerability against parameter corruptions.
PVF can provide pivotal insights to AI hardware designers in balancing the tradeoff between fault protection and performance/efficiency.
We present several use cases on applying PVF to three types of tasks/models during inference -- recommendation (DLRM), vision classification (CNN), and text classification (BERT)
arXiv Detail & Related papers (2024-05-02T21:23:34Z) - FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids [53.2306792009435]
FaultGuard is the first framework for fault type and zone classification resilient to adversarial attacks.
We propose a low-complexity fault prediction model and an online adversarial training technique to enhance robustness.
Our model outclasses the state-of-the-art for resilient fault prediction benchmarking, with an accuracy of up to 0.958.
arXiv Detail & Related papers (2024-03-26T08:51:23Z) - GPT-4 and Safety Case Generation: An Exploratory Analysis [2.3361634876233817]
This paper investigates the exploration of generating safety cases with large language models (LLMs) and conversational interfaces (ChatGPT)
Our primary objective is to delve into the existing knowledge base of GPT-4, focusing on its understanding of the Goal Structuring Notation (GSN)
We perform four distinct experiments with GPT-4 to assess its capacity for generating safety cases within a defined system and application domain.
arXiv Detail & Related papers (2023-12-09T22:28:48Z) - Security and Interpretability in Automotive Systems [0.0]
The lack of any sender authentication mechanism in place makes CAN (Controller Area Network) vulnerable to security threats.
This thesis demonstrates a sender authentication technique that uses power consumption measurements of the electronic control units (ECUs) and a classification model to determine the transmitting states of the ECUs.
arXiv Detail & Related papers (2022-12-23T01:33:09Z) - Exploring Robustness of Unsupervised Domain Adaptation in Semantic
Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space.
This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z) - Runtime Safety Assurance Using Reinforcement Learning [37.61747231296097]
This paper aims to design a meta-controller capable of identifying unsafe situations with high accuracy.
We frame the design of RTSA with the Markov decision process (MDP) and use reinforcement learning (RL) to solve it.
arXiv Detail & Related papers (2020-10-20T20:54:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.