Fugu-MT 論文翻訳(概要): Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

論文の概要: Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

arxiv url: http://arxiv.org/abs/2604.19826v1
Date: Mon, 20 Apr 2026 14:47:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:10.570261
Title: Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
Title（参考訳）: AIコードのより優れたコロケーションテスト:テストの構文構造がファンデーションモデルコード生成にどのように影響するか
Authors: Éric Jacopin,
Abstract要約: 開発者がどのようにコードをインラインで実装するか、あるいは別のブロックで構築するかは、伝統的にテスト哲学の問題であった。決定性,保存性,正確性を測定する3次元評価フレームワークSEGAを用いて,この選択がAIコード生成品質に影響を及ぼすかどうかを検討する。
参考スコア（独自算出の注目度）: 0.7310043452300737
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI coding assistants increasingly generate code alongside tests. How developers structure test code, whether inline with the implementation or in separate blocks, has traditionally been a matter of testing philosophy. We investigate whether this choice affects AI code generation quality. We conduct a large-scale empirical study (830+ generated files, 12 models, 3 providers) using SEGA, a three-dimensional evaluation framework measuring Determinism, Preservation, and Correctness. Comparing inline test syntax (Python doctests) against separated test syntax (Rust #[test] blocks) on a d-ary heap implementation, we find that: (1) inline tests yield near-perfect preservation (100%) and correctness (92-100%) across all models; (2) separated tests expose stark model-tier gaps (0-100% correctness) and independence between preservation and correctness; (3) model behavior evolves across generations, and notably one model breaks the test suppression pattern of its three predecessors; (4) mechanistic analysis on 7 open-source architectures (6 transformers and a gated-linear Recurrent Neural Network (RNN)) reveals inline test markers receive 2.8-4.4$\times$ stronger attention in 5/7 models, with causal validation via knockout and steering experiments on the 4 code-specialized transformers and RWKV-6; the co-location mechanism extends to a non-transformer architecture, suggesting the design recommendation is robust to future architectural shifts. In the Foundation Model era, test syntax structure is a software design concern: co-locating tests with implementation code produces measurably better AI-generated code. This arxiv long version includes appendices that further qualify the effect as bounded by both model capability and programming language.
Abstract（参考訳）: AIコーディングアシスタントは、テストと並行してコードを生成する。開発者がどのようにコードをインラインで実装するか、あるいは別のブロックで構築するかは、伝統的にテスト哲学の問題であった。この選択がAIコード生成の品質に影響を及ぼすかどうかを検討する。本研究は,SEGAを用いた大規模実験(830以上のファイル,12のモデル,3のプロバイダ)を行い,決定性,保存性,正確性を測定する3次元評価フレームワークを提案する。インラインテスト構文(Python doctests)をd-aryヒープ実装で分離したテスト構文(Rust #[test] blocks)と比較すると,(1)インラインテストがほぼ完全に近い保存(100%)と正しさ(92-100%)をすべてのモデルで得ること,(2)分離テストがスタークモデル階層のギャップ(0-100%の正しさ)と保存と正しさの独立性を露呈すること,(3)モデル動作が世代によって進化すること,(3)モデル動作が3つの前任者のテスト抑制パターンを破ること,(4)オープンソースアーキテクチャ(6つのトランスフォーマとゲート線形リカレントニューラルネットワーク(RNN))のメカニスティック解析により,インラインテストマーカーが5/7でより強い注意を払っていること,4.6～4.4$の値がカスタライズされたこと,4.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6 .6.6.7.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6 .6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6 .6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6 .6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6.6 .6.6.6.6.6.6.6. ファウンデーションモデルの時代において、テスト構文構造はソフトウェア設計上の問題である。このarxivの長いバージョンには、モデル能力とプログラミング言語の両方で制限された効果を更に評価する付属物が含まれている。

論文の概要: Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

関連論文リスト