Related papers: VersiCode: Towards Version-controllable Code Generation

VersiCode: Towards Version-controllable Code Generation

URL: http://arxiv.org/abs/2406.07411v2
Date: Wed, 16 Oct 2024 10:56:24 GMT
Title: VersiCode: Towards Version-controllable Code Generation
Authors: Tongtong Wu, Weigang Wu, Xingyu Wang, Kang Xu, Suyu Ma, Bo Jiang, Ping Yang, Zhenchang Xing, Yuan-Fang Li, Gholamreza Haffari,
Abstract summary: Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development. We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM) We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
Score: 58.82709231906735
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development, marked by frequent library updates. This gap significantly limits LLMs' deployment in realistic settings. In this paper, we propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM). In conjunction, we introduce VersiCode, a comprehensive Python dataset specifically designed to evaluate LLMs on these two tasks, together with a novel evaluation metric, Critical Diff Check (CDC@1), which assesses code generation against evolving API requirements. We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge, even for GPT-4o and other strong frontier models. We believe the novel tasks, dataset, and metric open up a new, important research direction that will further enhance LLMs' real-world applicability. The code and resources can be found at https://github.com/wutong8023/VersiCode.

Related papers

IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z)
Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization [4.7966941517322725]
Large Language Models (LLMs) have become widely used across diverse NLP tasks and domains.<n>LLMs show promise for tasks like Register-Transfer Level (RTL) code generation and summarization.<n>We propose Chain-of-Descriptions (CoDes) to enhance the performance of LLMs for VHDL code generation and summarization tasks.
arXiv Detail & Related papers (2025-07-16T15:05:30Z)
ReCode: Updating Code API Knowledge with Reinforcement Learning [45.077641074621816]
Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs.<n>We propose ReCode, a novel framework that mimics human programmer adaptation to API changes.<n>Our experiments demonstrate that ReCode substantially boosts LLMs' code generation performance in dynamic API scenarios.
arXiv Detail & Related papers (2025-06-25T14:41:13Z)
CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation [69.684886175768]
Large language models (LLMs) have shown promising performance in automated code generation. In this paper, we propose CodeRAG, a retrieval-augmented code generation framework. Experiments show that CodeRAG achieves significant improvements compared to no RAG scenarios.
arXiv Detail & Related papers (2025-04-14T09:51:23Z)
Robust Learning of Diverse Code Edits [10.565439872488328]
Software engineering activities frequently involve edits to existing code. Code language models (LMs) lack the ability to handle diverse types of code-edit requirements.
arXiv Detail & Related papers (2025-03-05T16:39:04Z)
Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar [15.421030528350212]
We build a code-obfuscation based benchmark OBFUSEVAL to evaluate large language models. We use three-level strategy to obfuscate descriptions, code and context dependencies. The results show that after obfuscation, the average decrease ratio of test pass rate can up to 62.5%.
arXiv Detail & Related papers (2024-12-11T05:31:39Z)
A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation [0.0]
27 recent papers have been reviewed and split into two groups. The first group consists of new methods for bug detection and repair, which include locating semantic errors. The second group dwells on code generation, providing an overview of both general-purpose LLMs fine-tuned for programming and task-specific models. It also presents methods to improve code generation, such as identifier-aware training, fine-tuning at the instruction level, and incorporating semantic code structures.
arXiv Detail & Related papers (2024-11-12T06:47:54Z)
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems. While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited. We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z)
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions. We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types. We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z)
A Survey on Large Language Models for Code Generation [9.555952109820392]
Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks. This survey aims to bridge the gap between academia and practical development by providing a comprehensive and up-to-date literature review.
arXiv Detail & Related papers (2024-06-01T17:48:15Z)
Automating Patch Set Generation from Code Review Comments Using Large Language Models [2.045040820541428]
We provide code contexts to five popular Large Language Models (LLMs) We obtain the suggested code-changes (patch sets) derived from real-world code-review comments. The performance of each model is meticulously assessed by comparing their generated patch sets against the historical data of human-generated patch-sets.
arXiv Detail & Related papers (2024-04-10T02:46:08Z)
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge. It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages. We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code [6.491009626125319]
We introduce CodeLL, a lifelong learning dataset focused on code changes. Our dataset aims to comprehensively capture code changes across the entire release history of open-source software repositories. CodeLL enables researchers studying the behaviour of LMs in lifelong fine-tuning settings for learning code changes.
arXiv Detail & Related papers (2023-12-20T01:20:24Z)
CodeT5+: Open Code Large Language Models for Code Understanding and Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.