Fugu-MT 論文翻訳(概要): Scaling Test-Driven Code Generation from Functions to Classes: An Empirical Study

論文の概要: Scaling Test-Driven Code Generation from Functions to Classes: An Empirical Study

arxiv url: http://arxiv.org/abs/2602.03557v1
Date: Tue, 03 Feb 2026 14:04:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-04 18:37:15.496639
Title: Scaling Test-Driven Code Generation from Functions to Classes: An Empirical Study
Title（参考訳）: 関数からクラスへのテスト駆動コード生成のスケールアップに関する実証的研究
Authors: Yunhao Liang, Ruixuan Ying, Shiwen Ni, Zhe Cui,
Abstract要約: テスト駆動開発(TDD)は、LLM(Large Language Model)ベースのコード生成を改善するために採用されている。反復型TDDフレームワークを使用して、関数からクラスへのテスト駆動コード生成をスケールします。我々のフレームワークは、クラスレベルの正しさを12から26の絶対点に改善し、最大71%の完全正解クラスを達成します。
参考スコア（独自算出の注目度）: 15.939308390535722
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Test-driven development (TDD) has been adopted to improve Large Language Model (LLM)-based code generation by using tests as executable specifications. However, existing TDD-style code generation studies are largely limited to function-level tasks, leaving class-level synthesis where multiple methods interact through shared state and call dependencies underexplored. In this paper, we scale test-driven code generation from functions to classes via an iterative TDD framework. Our approach first analyzes intra-class method dependencies to derive a feasible generation schedule, and then incrementally implements each method under method-level public tests with reflection-style execution feedback and bounded repair iterations. To support test-driven generation and rigorous class-level evaluation, we construct ClassEval-TDD, a cleaned and standardized variant of ClassEval with consistent specifications, deterministic test environments, and complete method-level public tests. We conduct an empirical study across eight LLMs and compare against the strongest direct-generation baseline (the best of holistic, incremental, and compositional strategies). Our class-level TDD framework consistently improves class-level correctness by 12 to 26 absolute points and achieves up to 71% fully correct classes, while requiring only a small number of repairs on average. These results demonstrate that test-driven generation can effectively scale beyond isolated functions and substantially improve class-level code generation reliability. All code and data are available at https://anonymous.4open.science/r/ClassEval-TDD-C4C9/
Abstract（参考訳）: テスト駆動開発(TDD)は、テストを実行可能な仕様として使用することで、Large Language Model(LLM)ベースのコード生成を改善するために採用されている。しかし、既存のTDDスタイルのコード生成の研究はほとんど関数レベルのタスクに限られており、複数のメソッドが共有状態を介して相互作用し、過度に調査された依存関係を呼び出すクラスレベルの合成を残している。本稿では、反復型TDDフレームワークを用いて、関数からクラスへのテスト駆動コード生成をスケールする。提案手法は,まずクラス内のメソッド依存性を分析して,実行可能な生成スケジュールを導出し,各メソッドをリフレクションスタイルの実行フィードバックとバウンド修理を繰り返したメソッドレベルの公開テストでインクリメンタルに実装する。テスト駆動型生成と厳密なクラスレベルの評価をサポートするため、一貫した仕様、決定論的テスト環境、完全なメソッドレベルの公開テストを備えたクラスEvalのクリーンで標準化されたバージョンであるClassEval-TDDを構築します。我々は,8つのLSMに対して実証的研究を行い,最強の直接世代ベースライン(総合的,漸進的,構成的戦略のベスト)と比較した。私たちのクラスレベルのTDDフレームワークは、クラスレベルの正確さを12から26の絶対点まで継続的に改善し、平均して少数の修正しか必要とせず、最大71%の完全な正しいクラスを実現しています。これらの結果から,テスト駆動型生成は孤立関数を超えて効果的にスケールでき,クラスレベルのコード生成信頼性を大幅に向上できることが示された。すべてのコードとデータはhttps://anonymous.4open.science/r/ClassEval-TDD-C4C9/で入手できる。

論文の概要: Scaling Test-Driven Code Generation from Functions to Classes: An Empirical Study

関連論文リスト