Related papers: A Study on the Improvement of Code Generation Quality Using Large Language Models Leveraging Product Documentation

A Study on the Improvement of Code Generation Quality Using Large Language Models Leveraging Product Documentation

URL: http://arxiv.org/abs/2503.17837v1
Date: Sat, 22 Mar 2025 18:42:05 GMT
Title: A Study on the Improvement of Code Generation Quality Using Large Language Models Leveraging Product Documentation
Authors: Takuro Morimoto, Harumi Haraguchi,
Abstract summary: This study proposes a method for automatically generating E2E test code from product documentation.<n>Tests generated from product documentation had high compilation success and functional coverage, outperforming those based on requirement specs and user stories.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Research on using Large Language Models (LLMs) in system development is expanding, especially in automated code and test generation. While E2E testing is vital for ensuring application quality, most test generation research has focused on unit tests, with limited work on E2E test code. This study proposes a method for automatically generating E2E test code from product documentation such as manuals, FAQs, and tutorials using LLMs with tailored prompts. The two step process interprets documentation intent and produces executable test code. Experiments on a web app with six key features (e.g., authentication, profile, discussion) showed that tests generated from product documentation had high compilation success and functional coverage, outperforming those based on requirement specs and user stories. These findings highlight the potential of product documentation to improve E2E test quality and, by extension, software quality.

Related papers

IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z)
Private GPTs for LLM-driven testing in software development and machine learning [0.0]
We examine the capability of private GPTs to automatically generate executable test code based on requirements.<n>We use acceptance criteria as input, formulated as part of epics, or stories, which are typically used in modern development processes.
arXiv Detail & Related papers (2025-06-06T20:05:41Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning [57.09163579304332]
We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories. PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files. We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
arXiv Detail & Related papers (2025-04-24T01:57:01Z)
LlamaRestTest: Effective REST API Testing with Small Language Models [50.058600784556816]
We present LlamaRestTest, a novel approach that employs two custom LLMs to generate realistic test inputs.<n>LlamaRestTest surpasses state-of-the-art tools in code coverage and error detection, even with RESTGPT-enhanced specifications.
arXiv Detail & Related papers (2025-01-15T05:51:20Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch.<n>Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests.<n> Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
ASTER: Natural and Multi-language Unit Test Generation with LLMs [6.259245181881262]
We describe a generic pipeline that incorporates static analysis to guide LLMs in generating compilable and high-coverage test cases. We conduct an empirical study to assess the quality of the generated tests in terms of code coverage and test naturalness.
arXiv Detail & Related papers (2024-09-04T21:46:18Z)
A System for Automated Unit Test Generation Using Large Language Models and Assessment of Generated Test Suites [1.4563527353943984]
Large Language Models (LLMs) have been applied to various aspects of software development. We present AgoneTest: an automated system for generating test suites for Java projects.
arXiv Detail & Related papers (2024-08-14T23:02:16Z)
Feature-Driven End-To-End Test Generation [5.7340627516257525]
AutoE2E is a novel approach to automate the generation of semantically meaningful feature-driven E2E test cases for web applications.<n>E2EBench is a new benchmark for automatically assessing the feature coverage of E2E test suites.
arXiv Detail & Related papers (2024-08-04T01:16:04Z)
CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.<n>We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.<n>We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z)
VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development. We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM) We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z)
Large Language Models to Generate System-Level Test Programs Targeting Non-functional Properties [3.3305233186101226]
This paper proposes Large Language Models (LLMs) to generate test programs. We take a first glance at how pre-trained LLMs perform in test program generation to optimize non-functional properties of the DUT.
arXiv Detail & Related papers (2024-03-15T08:01:02Z)
Test-Driven Development for Code Generation [0.850206009406913]
Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements. This paper investigates if and how Test-Driven Development (TDD) can be incorporated into AI-assisted code-generation processes.
arXiv Detail & Related papers (2024-02-21T04:10:12Z)
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios [87.12753459582116]
A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. We propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models.
arXiv Detail & Related papers (2023-07-25T14:20:51Z)
CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code. We conduct a human study to identify the criteria for high-quality explanatory docstring for code. We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z)
Generate rather than Retrieve: Large Language Models are Strong Context Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators. We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.