A Hazard Analysis Framework for Code Synthesis Large Language Models
- URL: http://arxiv.org/abs/2207.14157v1
- Date: Mon, 25 Jul 2022 20:44:40 GMT
- Title: A Hazard Analysis Framework for Code Synthesis Large Language Models
- Authors: Heidy Khlaaf, Pamela Mishkin, Joshua Achiam, Gretchen Krueger, Miles
Brundage
- Abstract summary: Codex, a large language model (LLM) trained on a variety of Codes, exceeds the previous state of the art in its capacity to synthesize and generate code.
This paper outlines a hazard analysis framework constructed at OpenAI to uncover hazards or safety risks that the deployment of models like Codex may impose technically, socially, politically, and economically.
- Score: 2.535935501467612
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Codex, a large language model (LLM) trained on a variety of codebases,
exceeds the previous state of the art in its capacity to synthesize and
generate code. Although Codex provides a plethora of benefits, models that may
generate code on such scale have significant limitations, alignment problems,
the potential to be misused, and the possibility to increase the rate of
progress in technical fields that may themselves have destabilizing impacts or
have misuse potential. Yet such safety impacts are not yet known or remain to
be explored. In this paper, we outline a hazard analysis framework constructed
at OpenAI to uncover hazards or safety risks that the deployment of models like
Codex may impose technically, socially, politically, and economically. The
analysis is informed by a novel evaluation framework that determines the
capacity of advanced code generation techniques against the complexity and
expressivity of specification prompts, and their capability to understand and
execute them relative to human ability.
Related papers
- Sycophancy in Large Language Models: Causes and Mitigations [0.0]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing tasks.
Their tendency to exhibit sycophantic behavior poses significant risks to their reliability and ethical deployment.
This paper provides a technical survey of sycophancy in LLMs, analyzing its causes, impacts, and potential mitigation strategies.
arXiv Detail & Related papers (2024-11-22T16:56:49Z) - HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of Quantization on Model Alignment [1.8843687952462742]
This paper aims to address gaps in the current literature on jailbreaking techniques and the evaluation of LLM vulnerabilities.
Our contributions include the creation of a novel dataset designed to assess the harmfulness of model outputs across multiple harm levels.
We provide a comprehensive benchmark of state-of-the-art jailbreaking attacks, specifically targeting the Vicuna 13B v1.5 model.
arXiv Detail & Related papers (2024-11-11T10:02:49Z) - HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation.
Recent studies highlight that many LLM-generated code contains serious security vulnerabilities.
We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z) - Generative AI Models: Opportunities and Risks for Industry and Authorities [1.3914994102950027]
Generative AI models are capable of performing a wide range of tasks that traditionally require creativity and human understanding.
They learn patterns from existing data during training and can subsequently generate new content.
The use of generative AI models introduces novel IT security risks that need to be considered.
arXiv Detail & Related papers (2024-06-07T08:34:30Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs.
Our studies reveal a new and universal safety vulnerability of these models against code input.
We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning [54.56905063752427]
Neuro-Symbolic AI (NeSy) holds promise to ensure the safe deployment of AI systems.
Existing pipelines that train the neural and symbolic components sequentially require extensive labelling.
New architecture, NeSyGPT, fine-tunes a vision-language foundation model to extract symbolic features from raw data.
arXiv Detail & Related papers (2024-02-02T20:33:14Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - A Simple, Yet Effective Approach to Finding Biases in Code Generation [16.094062131137722]
This work shows that current code generation systems exhibit undesired biases inherited from their large language model backbones.
We propose the "block of influence" concept, which enables a modular decomposition and analysis of the coding challenges.
arXiv Detail & Related papers (2022-10-31T15:06:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.