FLAG: Finding Line Anomalies (in code) with Generative AI
- URL: http://arxiv.org/abs/2306.12643v1
- Date: Thu, 22 Jun 2023 03:04:56 GMT
- Title: FLAG: Finding Line Anomalies (in code) with Generative AI
- Authors: Baleegh Ahmad, Benjamin Tan, Ramesh Karri, Hammond Pearce
- Abstract summary: FLAG is based on the lexical capabilities of generative AI, specifically, Large Language Models (LLMs)
We use 121 benchmarks across C, Python and Verilog; with each benchmark containing a known security or functional weakness.
FLAG can identify 101 of the defects and helps reduce the search space to 12-17% of source code.
- Score: 18.612900041820875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code contains security and functional bugs. The process of identifying and
localizing them is difficult and relies on human labor. In this work, we
present a novel approach (FLAG) to assist human debuggers. FLAG is based on the
lexical capabilities of generative AI, specifically, Large Language Models
(LLMs). Here, we input a code file then extract and regenerate each line within
that file for self-comparison. By comparing the original code with an
LLM-generated alternative, we can flag notable differences as anomalies for
further inspection, with features such as distance from comments and LLM
confidence also aiding this classification. This reduces the inspection search
space for the designer. Unlike other automated approaches in this area, FLAG is
language-agnostic, can work on incomplete (and even non-compiling) code and
requires no creation of security properties, functional tests or definition of
rules. In this work, we explore the features that help LLMs in this
classification and evaluate the performance of FLAG on known bugs. We use 121
benchmarks across C, Python and Verilog; with each benchmark containing a known
security or functional weakness. We conduct the experiments using two state of
the art LLMs in OpenAI's code-davinci-002 and gpt-3.5-turbo, but our approach
may be used by other models. FLAG can identify 101 of the defects and helps
reduce the search space to 12-17% of source code.
Related papers
- VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development.
We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM)
We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z) - Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting [78.48355455324688]
We propose a novel zero-shot synthetic code detector based on the similarity between the code and its rewritten variants.
Our results demonstrate a notable enhancement over existing synthetic content detectors designed for general texts.
arXiv Detail & Related papers (2024-05-25T08:57:28Z) - Chain of Targeted Verification Questions to Improve the Reliability of Code Generated by LLMs [10.510325069289324]
We propose a self-refinement method aimed at improving the reliability of code generated by LLMs.
Our approach is based on targeted Verification Questions (VQs) to identify potential bugs within the initial code.
Our method attempts to repair these potential bugs by re-prompting the LLM with the targeted VQs and the initial code.
arXiv Detail & Related papers (2024-05-22T19:02:50Z) - Enabling Memory Safety of C Programs using LLMs [5.297072277460838]
Memory safety violations in low-level code, written in languages like C, continue to remain one of the major sources of software vulnerabilities.
One method of removing such violations by construction is to port C code to a safe C dialect.
Such dialects rely on programmer-supplied annotations to guarantee safety with minimal runtime overhead.
This porting is a manual process that imposes significant burden on the programmer and hence, there has been limited adoption of this technique.
arXiv Detail & Related papers (2024-04-01T13:05:54Z) - AgentFL: Scaling LLM-based Fault Localization to Project-Level Context [11.147750199280813]
This paper presents AgentFL, a multi-agent system based on ChatGPT for automated fault localization.
By simulating the behavior of a human developer, AgentFL models the FL task as a three-step process, which involves comprehension, navigation, and confirmation.
The evaluation on the widely used Defects4J-V1.2.0 benchmark shows that AgentFL can localize 157 out of 395 bugs within Top-1.
arXiv Detail & Related papers (2024-03-25T01:58:19Z) - InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge.
It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages.
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z) - Assured LLM-Based Software Engineering [51.003878077888686]
This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
arXiv Detail & Related papers (2024-02-06T20:38:46Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z) - ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle
Verifiers [60.6418431624873]
Large language models (LLMs) excel at implementing code from functionality descriptions but struggle with algorithmic problems.
We propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness.
Experiments show that when equipped with ALGO, we achieve an 8x better one-submission pass rate over the Codex model and a 2.6x better one-submission pass rate over CodeT.
arXiv Detail & Related papers (2023-05-24T00:10:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.