Fugu-MT 論文翻訳(概要): TENET: Leveraging Tests Beyond Validation for Code Generation

論文の概要: TENET: Leveraging Tests Beyond Validation for Code Generation

arxiv url: http://arxiv.org/abs/2509.24148v2
Date: Tue, 30 Sep 2025 04:05:32 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 12:20:10.405792
Title: TENET: Leveraging Tests Beyond Validation for Code Generation
Title（参考訳）: TENET:コード生成の検証を超えてテストを活用する
Authors: Yiran Hu, Nan Jiang, Shanchao Liang, Yi Wu, Lin Tan,
Abstract要約: テスト駆動開発(TDD、Test-Driven Development)は、開発者がコード実装と一緒にテストを作成し実行する必要がある、広く採用されているソフトウェア工学のプラクティスである。本稿では、TDD設定の下で複雑な現実世界のリポジトリで関数を生成するエージェントTENETを紹介する。 TENETはRepoCodとRepoEvalのベンチマークで69.08%と81.77%のPass@1を達成した。
参考スコア（独自算出の注目度）: 15.74797688806215
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Test-Driven Development (TDD) is a widely adopted software engineering practice that requires developers to create and execute tests alongside code implementation, ensuring that software behavior is continuously validated and refined. In the era of vibe coding, where developers increasingly delegate code writing to large language models (LLMs) by specifying high-level intentions, TDD becomes even more crucial, as test cases serve as executable specifications that explicitly define and verify intended functionality beyond what natural-language descriptions and code context can convey. While vibe coding under TDD is promising, there are three main challenges: (1) selecting a small yet effective test suite to improve the generation accuracy and control the execution workload, (2) retrieving context such as relevant code effectively, and (3) systematically using test feedback for effective code refinement. To address these challenges, we introduce TENET, an LLM agent for generating functions in complex real-world repositories under the TDD setting. TENET features three components: (1) a novel test harness mechanism that selects a concise test suite to maximize diversity of target usage scenarios; (2) a tailored agent toolset that performs efficient retrieval of relevant code with interactive debugging; and (3) a reflection-based refinement workflow that iteratively analyzes failures, replenishes context, and applies code refinement. TENET achieves 69.08% and 81.77% Pass@1 on RepoCod and RepoEval benchmarks, outperforming the best agentic baselines by 9.49 and 2.17 percentage points, respectively. In addition, this is the first study of test-driven code generation with repository-level context, examining how different aspects of test suites affect the performance of LLM agents under the TDD setting.
Abstract（参考訳）: テスト駆動開発(TDD、Test-Driven Development)は、開発者がコード実装と一緒にテストを作成し、実行することを要求する、広く採用されているソフトウェアエンジニアリングのプラクティスである。開発者が大規模言語モデル(LLM)に高レベルの意図を指定してコード記述を委譲する、ビブコーディングの時代には、テストケースが自然言語の記述やコードコンテキストが伝える以上の意図された機能を明示的に定義し検証する実行可能な仕様として機能するため、TDDはさらに重要になります。 TDDの下でのバイブコーディングは有望だが、(1) 生成精度を改善し、実行負荷を制御するために、小さいが効果的なテストスイートを選択すること、(2) 関連コードのようなコンテキストを効果的に検索すること、(3) 効果的なコード修正にテストフィードバックを体系的に使用すること、の3つの主な課題がある。これらの課題に対処するために、TDD設定の下で複雑な現実世界のリポジトリで関数を生成するLLMエージェントであるTENETを紹介します。 TENETは,(1)目標シナリオの多様性を最大化するための簡潔なテストスイートを選択する新しいテストハーネス機構,(2)対話的デバッグによる関連コードの効率的な検索を行う調整されたエージェントツールセット,(3)障害を反復的に解析し,コンテキストを補充し,コード修正を適用するリフレクションベースの改善ワークフロー,の3つのコンポーネントを備えている。 TENETはRepoCodとRepoEvalのベンチマークで69.08%と81.77%のPass@1を達成した。さらに、これはリポジトリレベルのコンテキストによるテスト駆動コード生成に関する最初の研究であり、テストスイートの異なる側面がTDD設定下でのLLMエージェントのパフォーマンスにどのように影響するかを調べます。

論文の概要: TENET: Leveraging Tests Beyond Validation for Code Generation

関連論文リスト