CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing
- URL: http://arxiv.org/abs/2403.13583v2
- Date: Mon, 1 Jul 2024 09:59:47 GMT
- Title: CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing
- Authors: Xinyi He, Jiaru Zou, Yun Lin, Mengyu Zhou, Shi Han, Zejian Yuan, Dongmei Zhang,
- Abstract summary: Large Language Models have revolutionized code generation ability by converting natural language descriptions into executable code.
CoCoST framework enhances complex code generation by online searching for more information with planned queries and correctness testing for code refinement.
CoCoST is validated through rigorous experiments on the DS-1000 and ClassEval datasets.
- Score: 51.00909683314142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models have revolutionized code generation ability by converting natural language descriptions into executable code. However, generating complex code within real-world scenarios remains challenging due to intricate structures, subtle bugs, understanding of advanced data types, and lack of supplementary contents. To address these challenges, we introduce the CoCoST framework, which enhances complex code generation by online searching for more information with planned queries and correctness testing for code refinement. Moreover, CoCoST serializes the complex inputs and outputs to improve comprehension and generates test cases to ensure the adaptability for real-world applications. CoCoST is validated through rigorous experiments on the DS-1000 and ClassEval datasets. Experimental results show that CoCoST substantially improves the quality of complex code generation, highlighting its potential to enhance the practicality of LLMs in generating complex code.
Related papers
- NoviCode: Generating Programs from Natural Language Utterances by Novices [59.71218039095155]
We present NoviCode, a novel NL Programming task which takes as input an API and a natural language description by a novice non-programmer.
We show that NoviCode is indeed a challenging task in the code synthesis domain, and that generating complex code from non-technical instructions goes beyond the current Text-to-Code paradigm.
arXiv Detail & Related papers (2024-07-15T11:26:03Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - Validating LLM-Generated Programs with Metamorphic Prompt Testing [8.785973653167112]
Large Language Models (LLMs) are increasingly integrated into the software development lifecycle.
This paper proposes a novel solution called metamorphic prompt testing to address these challenges.
Our evaluation on HumanEval shows that metamorphic prompt testing is able to detect 75 percent of the erroneous programs generated by GPT-4, with a false positive rate of 8.6 percent.
arXiv Detail & Related papers (2024-06-11T00:40:17Z) - MapCoder: Multi-Agent Code Generation for Competitive Problem Solving [3.3856216159724983]
We introduce a new approach to code generation tasks leveraging multi-agent prompting.
Our framework, MapCoder, consists of four LLM agents specifically designed to emulate the stages of program synthesis.
Our method consistently delivers superior performance across various programming languages.
arXiv Detail & Related papers (2024-05-18T22:10:15Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - CodeComplex: A Time-Complexity Dataset for Bilingual Source Codes [6.169110187130671]
We introduce CodeComplex, a novel source code dataset where each code is manually annotated with a corresponding worst-case time complexity.
To the best of our knowledge, CodeComplex stands as the most extensive code dataset tailored for predicting complexity.
We present the outcomes of our experiments employing various baseline models, leveraging state-of-the-art neural models in code comprehension.
arXiv Detail & Related papers (2024-01-16T06:54:44Z) - Can Large Language Models Understand Real-World Complex Instructions? [54.86632921036983]
Large language models (LLMs) can understand human instructions, but struggle with complex instructions.
Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions.
We propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically.
arXiv Detail & Related papers (2023-09-17T04:18:39Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - COCO: Testing Code Generation Systems via Concretized Instructions [33.13427092832396]
COCO is a technique to test the robustness of code generation systems.
It exploits the usage scenario of code generation systems to make the original programming instruction more concrete.
We evaluated COCO on eight advanced code generation systems, including commercial tools such as Copilot and ChatGPT.
arXiv Detail & Related papers (2023-08-25T11:49:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.