Related papers: When LLM-based Code Generation Meets the Software Development Process

When LLM-based Code Generation Meets the Software Development Process

URL: http://arxiv.org/abs/2403.15852v1
Date: Sat, 23 Mar 2024 14:04:48 GMT
Title: When LLM-based Code Generation Meets the Software Development Process
Authors: Feng Lin, Dong Jae Kim, Tse-Husn, Chen,
Abstract summary: This paper introduces LCG, a code generation framework inspired by established software engineering practices. LLM agents emulate various software process models, namely LCGWaterfall, LCGTDD, and LCGScrum. We evaluate LCG across four code generation benchmarks: HumanEval, HumanEval-ET, MBPP, and MBPP-ET.
Score: 50.82665351100067
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software process models play a pivotal role in fostering collaboration and communication within software teams, enabling them to tackle intricate development tasks effectively. This paper introduces LCG, a code generation framework inspired by established software engineering practices. LCG leverages multiple Large Language Model (LLM) agents to emulate various software process models, namely LCGWaterfall, LCGTDD, and LCGScrum. Each model assigns LLM agents specific roles such as requirement engineer, architect, developer, tester, and scrum master, mirroring typical development activities and communication patterns. Through collaborative efforts utilizing chain-of-thought and prompt composition techniques, the agents continuously refine themselves to enhance code quality. Utilizing GPT3.5 as the underlying LLM and baseline (GPT), we evaluate LCG across four code generation benchmarks: HumanEval, HumanEval-ET, MBPP, and MBPP-ET. Results indicate LCGScrum outperforms other models, achieving Pass@1 scores of 75.2, 65.5, 82.5, and 56.7 in HumanEval, HumanEval-ET, MBPP, and MBPP-ET, respectively - an average 15% improvement over GPT. Analysis reveals distinct impacts of development activities on generated code, with design and code reviews contributing to enhanced exception handling, while design, testing, and code reviews mitigate code smells. Furthermore, temperature values exhibit negligible influence on Pass@1 across all models. However, variations in Pass@1 are notable for different GPT3.5 model versions, ranging from 5 to over 60 in HumanEval, highlighting the stability of LCG across model versions. This stability underscores the importance of adopting software process models to bolster the quality and consistency of LLM-generated code.

Related papers

ACT: Bridging the Gap in Code Translation through Synthetic Data Generation & Adaptive Training [1.4709455282157278]
Auto-Train for Code Translation (ACT) aims to improve code translation capabilities by enabling in-house finetuning of open-source Large Language Models (LLMs)<n>ACT's automated pipeline significantly boosts the performance of these models, narrowing the gap between open-source accessibility and the high performance of closed-source solutions.<n>Our results demonstrate that ACT consistently enhances the effectiveness of open-source models, offering businesses and developers a secure and reliable alternative.
arXiv Detail & Related papers (2025-07-22T11:35:35Z)
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation [68.19756761027351]
Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models.<n>We investigate their denoising processes and reinforcement learning methods.<n>Our work provides deeper insight into the machinery of dLLM generation and offers an effective, diffusion-native RL training framework.
arXiv Detail & Related papers (2025-06-25T17:35:47Z)
Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach [6.289275189295223]
We investigate the relationship between code complexity and the success of Large Language Models generated code.<n>We propose an iterative feedback method, where LLMs are prompted to generate correct code based on complexity metrics from previous failed outputs.<n>Experiment results show that our approach makes notable improvements, particularly with a smaller LLM.
arXiv Detail & Related papers (2025-05-29T19:06:14Z)
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning [35.429904556288996]
We introduce GenPRM, a generative process reward model that performs explicit Chain-of-Thought (CoT) reasoning with code verification. Experimental results show that GenPRM significantly outperforms prior PRMs with only 23K training data from MATH dataset.
arXiv Detail & Related papers (2025-04-01T15:21:05Z)
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation [71.31634636156384]
We introduce ComfyGPT, the first self-optimizing multi-agent system designed to generate ComfyUI based on task descriptions automatically. ComfyGPT comprises four specialized agents: ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent. FlowDataset is a large-scale dataset containing 13,571 workflow-description pairs, and FlowBench is a benchmark for evaluating workflow generation systems.
arXiv Detail & Related papers (2025-03-22T06:48:50Z)
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance [65.01483640267885]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. We introduce UnitCoder, a systematic pipeline leveraging model-generated unit tests to guide and validate the code generation process. Our work presents a scalable approach that leverages model-generated unit tests to guide the synthesis of high-quality code data from pre-training corpora.
arXiv Detail & Related papers (2025-02-17T05:37:02Z)
A Progressive Transformer for Unifying Binary Code Embedding and Knowledge Transfer [15.689556592544667]
We introduce ProTST, a novel transformer-based methodology for binary code embedding. ProTST employs a hierarchical training process based on a unique tree-like structure. Results show that ProTST yields an average validation score (F1, MRR, and Recall@1) improvement of 14.8% compared to traditional two-stage training.
arXiv Detail & Related papers (2024-12-15T13:04:29Z)
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets. The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method. The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System [13.65717444483291]
ToP (Think-on-Process) is a dynamic process generation framework for software development. Our framework significantly enhances the dynamic process generation capability of the GPT-3.5 and GPT-4.
arXiv Detail & Related papers (2024-09-10T15:02:34Z)
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI [64.57616646552869]
This paper explores collaborative AI systems that use to enhance performance to integrate models, data sources, and pipelines to solve complex and diverse tasks. We introduce GenAgent, an LLM-based framework that automatically generates complex, offering greater flexibility and scalability compared to monolithic models. The results demonstrate that GenAgent outperforms baseline approaches in both run-level and task-level evaluations.
arXiv Detail & Related papers (2024-09-02T17:44:10Z)
ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation [9.409062607311528]
Large language models (LLMs) have demonstrated excellent performance, inspiring researchers to explore their use in automating register transfer level (RTL) code generation. Existing approaches to fine-tune LLMs for RTL generation typically are conducted on fixed datasets. We introduce an iterative training paradigm named ITERTL to mitigate these issues. Our model outperforms GPT4 and state-of-the-art (SOTA) open-source models, achieving remarkable 53.8% pass@1 rate on VerilogEval-human benchmark.
arXiv Detail & Related papers (2024-06-28T01:44:57Z)
LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation [13.800675921118348]
We propose a novel interactive workflow TiCoder for guided intent clarification. We present an empirical evaluation of the effectiveness of the workflow to improve code generation accuracy. We observe an average absolute improvement of 45.97% in the pass@1 code generation accuracy for both datasets and across all LLMs within 5 user interactions.
arXiv Detail & Related papers (2024-04-15T19:16:32Z)
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation [11.155351560550853]
This paper introduces Multi-Agent Assistant Code Generation (AgentCoder) AgentCoder is a novel solution comprising a multi-agent framework with specialized agents: the programmer agent, the test designer agent, and the test executor agent. Our experiments on 9 code generation models and 12 enhancement approaches showcase AgentCoder's superior performance over existing code generation models.
arXiv Detail & Related papers (2023-12-20T13:22:41Z)
LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system. We build a novel data-cleaning pipeline that uses these principles to transform existing programs. We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z)
Execution-based Code Generation using Deep Reinforcement Learning [8.085533911328577]
PPOCoder is a new framework for code generation that combines pre-trained PL models with Proximal Policy Optimization. PPOCoder seamlessly integrates external code-specific knowledge into the model optimization process. It's important to note that PPOCoder is a task-agnostic and model-agnostic framework that can be used across different code generation tasks and PLs.
arXiv Detail & Related papers (2023-01-31T18:02:26Z)
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning. During inference, we introduce a new generation procedure with a critical sampling strategy. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.