From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven
AI Chaining
- URL: http://arxiv.org/abs/2309.15606v1
- Date: Wed, 27 Sep 2023 12:09:07 GMT
- Title: From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven
AI Chaining
- Authors: Xiaoxue Ren, Xinyuan Ye, Dehai Zhao, Zhenchang Xing, Xiaohu Yang
- Abstract summary: Large Language Models (LLMs) have shown promising results in automatic code generation by improving coding efficiency to a certain extent.
However, generating high-quality and reliable code remains a formidable task because of LLMs' lack of good programming practice.
We propose a novel Knowledge-driven Prompt Chaining-based code generation approach, which decomposes code generation into an AI chain with iterative check-rewrite steps.
- Score: 16.749379740049925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown promising results in automatic code
generation by improving coding efficiency to a certain extent. However,
generating high-quality and reliable code remains a formidable task because of
LLMs' lack of good programming practice, especially in exception handling. In
this paper, we first conduct an empirical study and summarise three crucial
challenges of LLMs in exception handling, i.e., incomplete exception handling,
incorrect exception handling and abuse of try-catch. We then try prompts with
different granularities to address such challenges, finding fine-grained
knowledge-driven prompts works best. Based on our empirical study, we propose a
novel Knowledge-driven Prompt Chaining-based code generation approach, name
KPC, which decomposes code generation into an AI chain with iterative
check-rewrite steps and chains fine-grained knowledge-driven prompts to assist
LLMs in considering exception-handling specifications. We evaluate our
KPC-based approach with 3,079 code generation tasks extracted from the Java
official API documentation. Extensive experimental results demonstrate that the
KPC-based approach has considerable potential to ameliorate the quality of code
generated by LLMs. It achieves this through proficiently managing exceptions
and obtaining remarkable enhancements of 109.86% and 578.57% with static
evaluation methods, as well as a reduction of 18 runtime bugs in the sampled
dataset with dynamic validation.
Related papers
- Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses.
Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies.
We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z) - Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [85.51252685938564]
Uncertainty quantification (UQ) is becoming increasingly recognized as a critical component of applications that rely on machine learning (ML)
As with other ML models, large language models (LLMs) are prone to make incorrect predictions, hallucinate'' by fabricating claims, or simply generate low-quality output for a given input.
We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines, and provides an environment for controllable and consistent evaluation of novel techniques.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - Chain of Targeted Verification Questions to Improve the Reliability of Code Generated by LLMs [10.510325069289324]
We propose a self-refinement method aimed at improving the reliability of code generated by LLMs.
Our approach is based on targeted Verification Questions (VQs) to identify potential bugs within the initial code.
Our method attempts to repair these potential bugs by re-prompting the LLM with the targeted VQs and the initial code.
arXiv Detail & Related papers (2024-05-22T19:02:50Z) - MapCoder: Multi-Agent Code Generation for Competitive Problem Solving [3.3856216159724983]
We introduce a new approach to code generation tasks leveraging multi-agent prompting.
Our framework, MapCoder, consists of four LLM agents specifically designed to emulate the stages of program synthesis.
Our method consistently delivers superior performance across various programming languages.
arXiv Detail & Related papers (2024-05-18T22:10:15Z) - Knowledge-Aware Code Generation with Large Language Models [34.806454393643236]
Large Language Models (LLMs) perform well on basic programming problems.
However, they encounter challenges when dealing with complex tasks involving the use of diverse algorithmic and data structure skills.
We develop a Knowledge Library tailored for Python programming contest problems and introduce the concept of Knowledge-Aware Code Generation.
arXiv Detail & Related papers (2024-01-29T08:01:22Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [69.99031792995348]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z) - Leveraging Print Debugging to Improve Code Generation in Large Language
Models [63.63160583432348]
Large language models (LLMs) have made significant progress in code generation tasks.
But their performance in tackling programming problems with complex data structures and algorithms remains suboptimal.
We propose an in-context learning approach that guides LLMs to debug by using a "print debug" method.
arXiv Detail & Related papers (2024-01-10T18:37:59Z) - Benchmarking and Explaining Large Language Model-based Code Generation:
A Causality-Centric Approach [12.214585409361126]
Large language models (LLMs)- based code generation is a complex and powerful black-box model.
We propose a novel causal graph-based representation of the prompt and the generated code.
We illustrate the insights that our framework can provide by studying over 3 popular LLMs with over 12 prompt adjustment strategies.
arXiv Detail & Related papers (2023-10-10T14:56:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.