Related papers: Knowledge-Aware Code Generation with Large Language Models

Knowledge-Aware Code Generation with Large Language Models

URL: http://arxiv.org/abs/2401.15940v3
Date: Thu, 1 Feb 2024 06:34:29 GMT
Title: Knowledge-Aware Code Generation with Large Language Models
Authors: Tao Huang, Zhihong Sun, Zhi Jin, Ge Li, Chen Lyu
Abstract summary: Large Language Models (LLMs) perform well on basic programming problems. However, they encounter challenges when dealing with complex tasks involving the use of diverse algorithmic and data structure skills. We develop a Knowledge Library tailored for Python programming contest problems and introduce the concept of Knowledge-Aware Code Generation.
Score: 34.806454393643236
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) perform well on basic programming problems. However, they encounter challenges when dealing with complex tasks involving the use of diverse algorithmic and data structure skills, particularly programming competition-level problems. Notably, ChatGPT exhibits proficient performance on problems it has encountered during its pre-training phase, but this performance deteriorates when faced with novel problems. Consequently, enhancing the ability of LLMs to address unfamiliar problems has emerged as a pivotal research focus. The problem-solving process of LLMs mirrors human programmers' approach to a certain extent. When confronted with new programming tasks, human programmers engage in task planning and code writing with the previously acquired knowledge about algorithms and data structures. Despite having learned such knowledge, LLMs struggle to effectively apply it when faced with specific new problems. To address this issue, we constructed a novel dataset, CodeF, which contains a portion of programming problems that ChatGPT has not previously encountered. Furthermore, we developed a Knowledge Library tailored for Python programming contest problems and introduced the concept of Knowledge-Aware Code Generation (KareCoder). KareCoder bolsters the models' understanding and problem-solving capabilities by integrating prompt and knowledge from the library into the LLMs' code generation reasoning process, especially on Pass@1 metrics. Upon testing on the CodeF and APPS datasets, KareCoder demonstrated outstanding performance in handling novel problems previously unencountered by LLMs. In contrast with the code directly generated by ChatGPT, KareCoder achieved a relative improvement of 23.3% on the Pass@1 metric on the CodeF post2021-9 dataset. Additionally, it performs well compared to other methods when dealing with problems that LLMs have previously encountered.

Related papers

Resolving Editing-Unlearning Conflicts: A Knowledge Codebook Framework for Large Language Model Updating [61.70705744491162]
Large Language Models (LLMs) excel in natural language processing by encoding extensive human knowledge. Updating LLMs involves two key tasks simultaneously: unlearning to remove unwanted knowledge and editing to incorporate new information. We propose LOKA, a conflict-free framework for LLM updating based on a knowledge codebook.
arXiv Detail & Related papers (2025-01-31T20:48:46Z)
Crystal: Illuminating LLM Abilities on Language and Code [58.5467653736537]
We propose a pretraining strategy to enhance the integration of natural language and coding capabilities. The resulting model, Crystal, demonstrates remarkable capabilities in both domains.
arXiv Detail & Related papers (2024-11-06T10:28:46Z)
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions. We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types. We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z)
Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns? [57.80779199039929]
Large Language Models (LLMs) have demonstrated remarkable performance in solving math problems. This paper introduces a novel benchmark, BeyondX, designed to address these limitations by incorporating problems with multiple unknowns. Empirical study on BeyondX reveals that the performance of existing LLMs, even those fine-tuned specifically on math tasks, significantly decreases as the number of unknowns increases.
arXiv Detail & Related papers (2024-07-06T17:01:04Z)
Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent [2.8391355909797644]
Large language models (LLMs) have significantly improved their ability to perform tasks in the field of code generation. There is still a gap between LLMs being capable coders and being top-tier software engineers.
arXiv Detail & Related papers (2024-05-31T22:06:18Z)
PECC: Problem Extraction and Coding Challenges [3.287942619833188]
We introduce PECC, a novel benchmark derived from Advent Of Code (AoC) challenges and Project Euler. Unlike conventional benchmarks, PECC requires LLMs to interpret narrative-embedded problems, extract requirements, and generate code. Results show varying model performance between narrative and neutral problems, with specific challenges in the Euler math-based subset.
arXiv Detail & Related papers (2024-04-29T15:02:14Z)
Let's Ask AI About Their Programs: Exploring ChatGPT's Answers To Program Comprehension Questions [2.377308748205625]
We explore the capability of the state-of-the-art LLMs in answering QLCs that are generated from code that the LLMs have created. Our results show that although the state-of-the-art LLMs can create programs and trace program execution when prompted, they easily succumb to similar errors that have previously been recorded for novice programmers.
arXiv Detail & Related papers (2024-04-17T20:37:00Z)
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions [47.83142414018448]
We focus on two popular reasoning tasks: arithmetic reasoning and code generation. We introduce (i) a general ontology of perturbations for math and coding questions, (ii) a semi-automatic method to apply these perturbations, and (iii) two datasets. We show a significant performance drop across all the models against perturbed questions.
arXiv Detail & Related papers (2024-01-17T18:13:07Z)
Competition-Level Problems are Effective LLM Evaluators [121.15880285283116]
This paper aims to evaluate the reasoning capacities of large language models (LLMs) in solving recent programming problems in Codeforces. We first provide a comprehensive evaluation of GPT-4's peiceived zero-shot performance on this task, considering various aspects such as problems' release time, difficulties, and types of errors encountered. Surprisingly, theThoughtived performance of GPT-4 has experienced a cliff like decline in problems after September 2021 consistently across all the difficulties and types of problems.
arXiv Detail & Related papers (2023-12-04T18:58:57Z)
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach [12.214585409361126]
Large language models (LLMs)- based code generation is a complex and powerful black-box model. We propose a novel causal graph-based representation of the prompt and the generated code. We illustrate the insights that our framework can provide by studying over 3 popular LLMs with over 12 prompt adjustment strategies.
arXiv Detail & Related papers (2023-10-10T14:56:26Z)
Exploring the Robustness of Large Language Models for Solving Programming Problems [15.80687717725775]
We conduct experiments to understand the robustness of several popular large language models (LLMs) for source code generation. Our results show that CodeGen and Codex are sensitive to the superficial modifications of problem descriptions and significantly impact code generation performance. The state-of-the-art (SOTA) models, such as InstructGPT and ChatGPT, show higher robustness to superficial modifications and have an outstanding capability for solving programming problems.
arXiv Detail & Related papers (2023-06-26T10:48:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.