From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven
AI Chaining
- URL: http://arxiv.org/abs/2309.15606v1
- Date: Wed, 27 Sep 2023 12:09:07 GMT
- Title: From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven
AI Chaining
- Authors: Xiaoxue Ren, Xinyuan Ye, Dehai Zhao, Zhenchang Xing, Xiaohu Yang
- Abstract summary: Large Language Models (LLMs) have shown promising results in automatic code generation by improving coding efficiency to a certain extent.
However, generating high-quality and reliable code remains a formidable task because of LLMs' lack of good programming practice.
We propose a novel Knowledge-driven Prompt Chaining-based code generation approach, which decomposes code generation into an AI chain with iterative check-rewrite steps.
- Score: 16.749379740049925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown promising results in automatic code
generation by improving coding efficiency to a certain extent. However,
generating high-quality and reliable code remains a formidable task because of
LLMs' lack of good programming practice, especially in exception handling. In
this paper, we first conduct an empirical study and summarise three crucial
challenges of LLMs in exception handling, i.e., incomplete exception handling,
incorrect exception handling and abuse of try-catch. We then try prompts with
different granularities to address such challenges, finding fine-grained
knowledge-driven prompts works best. Based on our empirical study, we propose a
novel Knowledge-driven Prompt Chaining-based code generation approach, name
KPC, which decomposes code generation into an AI chain with iterative
check-rewrite steps and chains fine-grained knowledge-driven prompts to assist
LLMs in considering exception-handling specifications. We evaluate our
KPC-based approach with 3,079 code generation tasks extracted from the Java
official API documentation. Extensive experimental results demonstrate that the
KPC-based approach has considerable potential to ameliorate the quality of code
generated by LLMs. It achieves this through proficiently managing exceptions
and obtaining remarkable enhancements of 109.86% and 578.57% with static
evaluation methods, as well as a reduction of 18 runtime bugs in the sampled
dataset with dynamic validation.
Related papers
- Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach [54.03528377384397]
In real world software development, improper or missing exception handling can severely impact the robustness and reliability of code.
We explore the use of large language models (LLMs) to improve exception handling in code.
We propose Seeker, a multi agent framework inspired by expert developer strategies for exception handling.
arXiv Detail & Related papers (2024-10-09T14:45:45Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - CodeSift: An LLM-Based Reference-Less Framework for Automatic Code Validation [3.22798929957223]
Large language models (LLMs) have greatly facilitated code generation, but ensuring the functional correctness of generated code remains a challenge.
Traditional validation methods are often time-consuming, error-prone, and impractical for large volumes of code.
We introduce CodeSift, a novel framework that leverages LLMs as the first-line filter of code validation without the need for execution, reference code, or human feedback.
arXiv Detail & Related papers (2024-08-28T08:32:21Z) - Understanding Defects in Generated Codes by Language Models [0.669087470775851]
This study categorizes and analyzes 367 identified defects from code snippets generated by Large Language Models.
Error categories indicate key areas where LLMs frequently fail, underscoring the need for targeted improvements.
This paper implemented five prompt engineering techniques, including Scratchpad Prompting, Program of Thoughts Prompting, Chain-of-Thought Prompting, Chain-of-Thought Prompting, and Structured Chain-of-Thought Prompting.
arXiv Detail & Related papers (2024-08-23T21:10:09Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses.
Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies.
We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z) - Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification (UQ) is a critical component of machine learning (ML) applications.
We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.
We conduct a large-scale empirical investigation of UQ and normalization techniques across nine tasks, and identify the most promising approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z) - Chain of Targeted Verification Questions to Improve the Reliability of Code Generated by LLMs [10.510325069289324]
We propose a self-refinement method aimed at improving the reliability of code generated by LLMs.
Our approach is based on targeted Verification Questions (VQs) to identify potential bugs within the initial code.
Our method attempts to repair these potential bugs by re-prompting the LLM with the targeted VQs and the initial code.
arXiv Detail & Related papers (2024-05-22T19:02:50Z) - Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code.
We find that code prompting exhibits a high-performance boost for multiple LLMs.
Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z) - Benchmarking and Explaining Large Language Model-based Code Generation:
A Causality-Centric Approach [12.214585409361126]
Large language models (LLMs)- based code generation is a complex and powerful black-box model.
We propose a novel causal graph-based representation of the prompt and the generated code.
We illustrate the insights that our framework can provide by studying over 3 popular LLMs with over 12 prompt adjustment strategies.
arXiv Detail & Related papers (2023-10-10T14:56:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.