CoDefeater: Using LLMs To Find Defeaters in Assurance Cases
- URL: http://arxiv.org/abs/2407.13717v1
- Date: Thu, 18 Jul 2024 17:16:35 GMT
- Title: CoDefeater: Using LLMs To Find Defeaters in Assurance Cases
- Authors: Usman Gohar, Michael C. Hunter, Robyn R. Lutz, Myra B. Cohen,
- Abstract summary: This paper proposes CoDefeater, an automated process to leverage large language models (LLMs) for finding defeaters.
Initial results on two systems show that LLMs can efficiently find known and unforeseen feasible defeaters to support safety analysts.
- Score: 4.4398355848251745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Constructing assurance cases is a widely used, and sometimes required, process toward demonstrating that safety-critical systems will operate safely in their planned environment. To mitigate the risk of errors and missing edge cases, the concept of defeaters - arguments or evidence that challenge claims in an assurance case - has been introduced. Defeaters can provide timely detection of weaknesses in the arguments, prompting further investigation and timely mitigations. However, capturing defeaters relies on expert judgment, experience, and creativity and must be done iteratively due to evolving requirements and regulations. This paper proposes CoDefeater, an automated process to leverage large language models (LLMs) for finding defeaters. Initial results on two systems show that LLMs can efficiently find known and unforeseen feasible defeaters to support safety analysts in enhancing the completeness and confidence of assurance cases.
Related papers
- A PRISMA-Driven Bibliometric Analysis of the Scientific Literature on Assurance Case Patterns [7.930875992631788]
Assurance cases can be used to prevent system failure.
They are structured arguments that allow arguing and relaying various safety-critical systems' requirements.
arXiv Detail & Related papers (2024-07-06T05:00:49Z) - SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance [48.80398992974831]
SafeAligner is a methodology implemented at the decoding stage to fortify defenses against jailbreak attacks.
We develop two specialized models: the Sentinel Model, which is trained to foster safety, and the Intruder Model, designed to generate riskier responses.
We show that SafeAligner can increase the likelihood of beneficial tokens, while reducing the occurrence of harmful ones.
arXiv Detail & Related papers (2024-06-26T07:15:44Z) - Jailbreaking as a Reward Misspecification Problem [80.52431374743998]
We propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process.
We introduce a metric ReGap to quantify the extent of reward misspecification and demonstrate its effectiveness and robustness in detecting harmful backdoor prompts.
We present ReMiss, a system for automated red teaming that generates adversarial prompts against various target aligned LLMs.
arXiv Detail & Related papers (2024-06-20T15:12:27Z) - Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations [76.19419888353586]
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations.
We present our efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms.
arXiv Detail & Related papers (2024-03-09T21:07:16Z) - Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.
While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety.
This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z) - On Prompt-Driven Safeguarding for Large Language Models [172.13943777203377]
We find that in the representation space, the input queries are typically moved by safety prompts in a "higher-refusal" direction.
Inspired by these findings, we propose a method for safety prompt optimization, namely DRO.
Treating a safety prompt as continuous, trainable embeddings, DRO learns to move the queries' representations along or opposite the refusal direction, depending on their harmfulness.
arXiv Detail & Related papers (2024-01-31T17:28:24Z) - I came, I saw, I certified: some perspectives on the safety assurance of
cyber-physical systems [5.9395940943056384]
Execution failure of cyber-physical systems could result in loss of life, severe injuries, large-scale environmental damage, property destruction, and major economic loss.
It is often mandatory to develop compelling assurance cases to support that justification and allow regulatory bodies to certify such systems.
We explore challenges related to such assurance enablers and outline some potential directions that could be explored to tackle them.
arXiv Detail & Related papers (2024-01-30T00:06:16Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - Large Language Model-Powered Smart Contract Vulnerability Detection: New
Perspectives [8.524720028421447]
This paper provides a systematic analysis of the opportunities, challenges, and potential solutions of harnessing Large Language Models (LLMs) such as GPT-4.
generating more answers with higher randomness largely boosts the likelihood of producing a correct answer but inevitably leads to a higher number of false positives.
We propose an adversarial framework dubbed GPTLens that breaks the conventional one-stage detection into two synergistic stages $-$ generation and discrimination.
arXiv Detail & Related papers (2023-10-02T12:37:23Z) - A Survey of Safety and Trustworthiness of Large Language Models through
the Lens of Verification and Validation [21.242078120036176]
Large Language Models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations.
This survey concerns their safety and trustworthiness in industrial applications.
arXiv Detail & Related papers (2023-05-19T02:41:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.