Fugu-MT 論文翻訳(概要): Constraint Decay: The Fragility of LLM Agents in Backend Code Generation

論文の概要: Constraint Decay: The Fragility of LLM Agents in Backend Code Generation

arxiv url: http://arxiv.org/abs/2605.06445v1
Date: Thu, 07 May 2026 15:44:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.953151
Title: Constraint Decay: The Fragility of LLM Agents in Backend Code Generation
Title（参考訳）: 制約緩和 - バックエンドコード生成におけるLLMエージェントの脆弱性
Authors: Francesco Dente, Dario Satriani, Paolo Papotti,
Abstract要約: 大きな言語モデル(LLM)エージェントは、緩やかな仕様の下で、自律的なコード生成において強力なパフォーマンスを示す。プロダクショングレードのソフトウェアは、アーキテクチャパターンやデータベース、オブジェクト-リレーショナルマッピングといった構造的制約に厳格に固執する必要がある。本稿では,バックエンド生成における構造的制約を適切に扱えるかを評価する。
参考スコア（独自算出の注目度）: 9.659020624935687
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Model (LLM) agents demonstrate strong performance in autonomous code generation under loose specifications. However, production-grade software requires strict adherence to structural constraints, such as architectural patterns, databases, and object-relational mappings. Existing benchmarks often overlook these non-functional requirements, rewarding functionally correct but structurally arbitrary solutions. We present a systematic study evaluating how well agents handle structural constraints in multi-file backend generation. By fixing a unified API contract across 80 greenfield generation tasks and 20 feature-implementation tasks spanning eight web frameworks, we isolate the effect of structural complexity using a dual evaluation with end-to-end behavioral tests and static verifiers. Our findings reveal a phenomenon of constraint decay: as structural requirements accumulate, agent performance exhibits a substantial decline. Capable configurations lose 30 points on average in assertion pass rates from baseline to fully specified tasks, while some weaker configurations approach zero. Framework sensitivity analysis exposes significant performance disparities: agents succeed in minimal, explicit frameworks (e.g., Flask) but perform substantially worse on average in convention-heavy environments (e.g., FastAPI, Django). Finally, error analysis identifies data-layer defects (e.g., incorrect query composition and ORM runtime violations) as the leading root causes. This work highlights that jointly satisfying functional and structural requirements remains a key open challenge for coding agents.
Abstract（参考訳）: 大きな言語モデル(LLM)エージェントは、緩やかな仕様の下で、自律的なコード生成において強力なパフォーマンスを示す。しかし、プロダクショングレードのソフトウェアは、アーキテクチャパターンやデータベース、オブジェクト-リレーショナルマッピングといった構造的制約に厳格に固執する必要がある。既存のベンチマークはしばしばこれらの非機能要件を見落とし、機能的には正しいが構造的には任意な解決策に報いる。マルチファイルバックエンド生成において,エージェントが構造的制約をどのようにうまく扱えるかを評価する。 80のグリーンフィールド生成タスクと8つのWebフレームワークにまたがる機能実装タスク20の統一APIコントラクトを固定することにより、エンド・ツー・エンドの動作テストと静的検証による二重評価を用いて、構造的複雑さの影響を分離する。構造的要求が蓄積されるにつれて, エージェント性能は著しく低下する。容量構成は、アサーションパスレートがベースラインから完全に指定されたタスクに平均30ポイント低下する一方、いくつかの弱い設定はゼロに近づいた。エージェントは最小限の明示的なフレームワーク(例えば、Frask)で成功するが、コンベンションの多い環境(例えば、FastAPI、Django)では、平均的にかなりパフォーマンスが悪くなる。最後に、エラー解析は、データ層欠陥(例えば、不正なクエリ合成とORMランタイム違反)を主要な原因として特定する。この研究は、機能的および構造的要求を共同で満たすことが、コーディングエージェントにとって重要な課題であることを強調している。

論文の概要: Constraint Decay: The Fragility of LLM Agents in Backend Code Generation

関連論文リスト