Fugu-MT 論文翻訳(概要): From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements

論文の概要: From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements

arxiv url: http://arxiv.org/abs/2605.17242v1
Date: Sun, 17 May 2026 03:48:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:47.797196
Title: From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements
Title（参考訳）: RunnableからShippableへ - 要求からフルスタックWebアプリケーションを生成するためのマルチエージェントテスト駆動開発
Authors: Yuxuan Wan, Tingshuo Liang, Jiakai Xu, Jingyu Xiao, Yintong Huo, Michael R Lyu,
Abstract要約: テスト駆動開発(TDDev)は、このクローズドループを3段階を通じて自動化するフレームワークである。我々は、Webアプリケーション生成のためのテスト駆動開発戦略について、初めて制御された実証的研究を行う。 TDDevは、手動による開発者の介入をゼロに減らし、ワークロードを継続的プロンプトエンジニアリングから、自律的なフィードバック駆動の洗練へとシフトさせる。
参考スコア（独自算出の注目度）: 34.560333810255464
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Coding agents can generate web applications from natural-language descriptions, yet a recent benchmark study shows that generated applications fail to meet functional requirements in over 70% of cases. The core difficulty is that web correctness cannot be assessed from source files or terminal output: the application must be deployed, exercised through simulated browser interactions, and failures must be translated into actionable repair signals -- steps that current agents cannot perform without human mediation. We present TDDev, a framework that automates this closed loop through three stages: (1) converting high-level requirements into structured acceptance tests before any code is written, (2) deploying the application and validating it through browser-based interaction simulation, and (3) translating browser-observed failures into structured repair reports for the coding agent. Enabled by TDDev, we conduct the first controlled empirical study of Test-driven development (TDD) strategies for web application generation, comparing four development protocols across two coding agents, two backbone models, and two benchmarks. TDD infrastructure consistently improves generation quality by 34--48 percentage points over a no-TDD baseline. The central finding is that the optimal protocol depends on the model's generation style: models that build applications holistically benefit most from agentic enforcement, while models that extend code conservatively benefit from incremental enforcement. Mismatching protocol to generation style eliminates the TDD benefit entirely while multiplying token cost up to 25-fold. A user study confirms that TDDev reduces manual developer intervention to zero, shifting the workload from continuous prompt engineering to autonomous, feedback-driven refinement.
Abstract（参考訳）: コーディングエージェントは自然言語による記述からWebアプリケーションを生成することができるが、最近のベンチマークでは、生成されたアプリケーションは70%以上のケースで機能要件を満たすことができないことが示されている。ウェブの正確性は、ソースファイルや端末の出力から評価できない。アプリケーションがデプロイされ、シミュレートされたブラウザのインタラクションを通じて実行され、障害は実行可能な修復信号に変換されなければならない。 1)コードを記述する前に高いレベルの要求を構造化された受け入れテストに変換すること,(2)アプリケーションをデプロイしてブラウザベースのインタラクションシミュレーションを通じて検証すること,(3)ブラウザが保持する障害をコーディングエージェントの構造化された修復レポートに変換すること,の3つの段階を通じて,クローズドループを自動化するフレームワークであるTDDevを紹介します。 TDDevによって実現され、2つのコーディングエージェント、2つのバックボーンモデル、2つのベンチマークにわたる4つの開発プロトコルを比較し、Webアプリケーション生成のためのテスト駆動開発(TDD)戦略に関する、初めて制御された実証的研究を行います。 TDDインフラストラクチャは、No-TDDベースラインに対して、生成品質を34～48ポイント改善します。その中心的な発見は、最適なプロトコルは、モデルの生成スタイルに依存しているということだ。アプリケーションを構築するモデルは、エージェントによる強制から最も利益を得るが、コードを拡張するモデルは、インクリメンタルな強制から利益を得る。プロトコルから生成スタイルへのミスマッチは、トークンのコストを最大25倍にし、TDDのメリットを完全に排除します。ユーザ調査によると、TDDevは手動による開発者の介入をゼロに減らし、ワークロードを継続的プロンプトエンジニアリングから、自律的なフィードバック駆動の洗練へとシフトさせる。

論文の概要: From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements

関連論文リスト