CGEMs: A Metric Model for Automatic Code Generation using GPT-3
- URL: http://arxiv.org/abs/2108.10168v1
- Date: Mon, 23 Aug 2021 13:28:57 GMT
- Title: CGEMs: A Metric Model for Automatic Code Generation using GPT-3
- Authors: Aishwarya Narasimhan (1), Krishna Prasad Agara Venkatesha Rao (2),
Veena M B (1) ((1) B M S College of Engineering, (2) Sony India Software
Centre Pvt. Ltd.)
- Abstract summary: This work aims to validate AI-generated content using theoretical proofs or by using Monte-Carlo simulation methods.
In this case, we use the latter approach to test/validate a statistically significant number of samples.
The various metrics that are garnered in this work to support the evaluation of generated code are as follows: Compilation, NL description to logic conversion, number of edits needed, some of the commonly used static-code metrics and NLP metrics.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Today, AI technology is showing its strengths in almost every industry and
walks of life. From text generation, text summarization, chatbots, NLP is being
used widely. One such paradigm is automatic code generation. An AI could be
generating anything; hence the output space is unconstrained. A self-driving
car is driven for 100 million miles to validate its safety, but tests cannot be
written to monitor and cover an unconstrained space. One of the solutions to
validate AI-generated content is to constrain the problem and convert it from
abstract to realistic, and this can be accomplished by either validating the
unconstrained algorithm using theoretical proofs or by using Monte-Carlo
simulation methods. In this case, we use the latter approach to test/validate a
statistically significant number of samples. This hypothesis of validating the
AI-generated code is the main motive of this work and to know if AI-generated
code is reliable, a metric model CGEMs is proposed. This is an extremely
challenging task as programs can have different logic with different naming
conventions, but the metrics must capture the structure and logic of the
program. This is similar to the importance grammar carries in AI-based text
generation, Q&A, translations, etc. The various metrics that are garnered in
this work to support the evaluation of generated code are as follows:
Compilation, NL description to logic conversion, number of edits needed, some
of the commonly used static-code metrics and NLP metrics. These metrics are
applied to 80 codes generated using OpenAI's GPT-3. Post which a Neural network
is designed for binary classification (acceptable/not acceptable quality of the
generated code). The inputs to this network are the values of the features
obtained from the metrics. The model achieves a classification accuracy of
76.92% and an F1 score of 55.56%. XAI is augmented for model interpretability.
Related papers
- Opening the AI black box: program synthesis via mechanistic
interpretability [12.849101734204456]
We present a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code.
We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4.
As opposed to large language models, this program synthesis technique makes no use of (and is therefore not limited by) human training data such as algorithms and code from GitHub.
arXiv Detail & Related papers (2024-02-07T18:59:12Z) - GEC-DePenD: Non-Autoregressive Grammatical Error Correction with
Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models.
We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network.
We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z) - Leveraging Generative AI: Improving Software Metadata Classification
with Generated Code-Comment Pairs [0.0]
In software development, code comments play a crucial role in enhancing code comprehension and collaboration.
This research paper addresses the challenge of objectively classifying code comments as "Useful" or "Not Useful"
We propose a novel solution that harnesses contextualized embeddings, particularly BERT, to automate this classification process.
arXiv Detail & Related papers (2023-10-14T12:09:43Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z) - Is this Snippet Written by ChatGPT? An Empirical Study with a
CodeBERT-Based Classifier [13.613735709997911]
This paper presents an empirical study to investigate the feasibility of automated identification of AI-generated code snippets.
We propose a novel approach called GPTSniffer, which builds on top of CodeBERT to detect source code written by AI.
The results show that GPTSniffer can accurately classify whether code is human-written or AI-generated, and outperforms two baselines.
arXiv Detail & Related papers (2023-07-18T16:01:15Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z) - Who Evaluates the Evaluators? On Automatic Metrics for Assessing
AI-based Offensive Code Generators [1.7616042687330642]
Code generators are an emerging solution for automatically writing programs starting from descriptions in natural language.
In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks.
This work analyzes a large set of output similarity metrics on offensive code generators.
arXiv Detail & Related papers (2022-12-12T16:16:09Z) - AI Model Utilization Measurements For Finding Class Encoding Patterns [2.702380921892937]
This work addresses the problems of designing utilization measurements of trained artificial intelligence (AI) models.
The problems are motivated by the lack of explainability of AI models in security and safety critical applications.
arXiv Detail & Related papers (2022-12-12T02:18:10Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent.
It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics.
We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.