Fugu-MT 論文翻訳(概要): PlanCompiler: A Deterministic Compilation Architecture for Structured Multi-Step LLM Pipelines

論文の概要: PlanCompiler: A Deterministic Compilation Architecture for Structured Multi-Step LLM Pipelines

arxiv url: http://arxiv.org/abs/2604.13092v1
Date: Wed, 08 Apr 2026 00:54:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-16 20:38:32.189324
Title: PlanCompiler: A Deterministic Compilation Architecture for Structured Multi-Step LLM Pipelines
Title（参考訳）: PlanCompiler: 構造化多段LLMパイプラインのための決定論的コンパイルアーキテクチャ
Authors: Pranav Harikumar,
Abstract要約: PlanCompilerは、構造化パイプライン用のコンパイルアーキテクチャで、タイプされたノードレジストリレジストリによる実行からプランニングを分離する。 PlanCompilerはプリミティブの固定レジストリ上で型付きプランを生成し、そのプランを明示的な構造的制約と型制約に対して検証し、検証済みプランのみを実行可能なPythonにコンパイルする。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) remain brittle in multi-step structured workflows, where errors compound across sequential transformations, validation stages, and stateful operations such as SQL persistence. We present PlanCompiler, a compilation architecture for structured LLM pipelines that separates planning from execution through a typed node registry, static graph validation, and deterministic compilation. Instead of relying on autoregressive chaining at runtime, the system first produces a typed JSON plan over a fixed registry of primitives, validates that plan against explicit structural and type constraints, and compiles only validated plans into executable Python. We evaluate the approach on a 300-task benchmark covering increasing workflow depth, SQL roundtrip persistence, and schema-themed stress tests. In this setting, PlanCompiler achieves 100% first-pass success on Sets A and B, 88% on Set C, 96% on Set D, 88% on schema-trap tasks, and 84% on SQL roundtrip tasks, outperforming direct free-form code-generation baselines from GPT-4.1 and Claude Sonnet on five of six benchmark sets and achieving 278/300 successes overall versus 202/300 and 187/300 for the two baselines, respectively. Across the full suite, planning cost is approximately \$0.356, compared with \$2.140 for GPT-4.1 and \$18.391 for Claude, while maintaining competitive end-to-end latency. These results suggest that, for registry-constrained structured data workflows, deterministic compilation can improve first-pass reliability and cost efficiency relative to free-form code generation. Residual failures are concentrated in two narrow classes: late output-contract errors on aggregation tasks and early type mismatches at the SQLite persistence boundary, clarifying both the benefits and the current limits of the approach.
Abstract（参考訳）: 大規模言語モデル(LLM)は、シーケンシャルな変換、検証ステージ、SQL永続化のようなステートフルな操作にエラーが混在する、多段階構造ワークフローにおいて脆弱なままである。我々は,構造化LLMパイプラインのコンパイルアーキテクチャであるPlanCompilerを提案する。これは,タイプノードレジストリ,静的グラフバリデーション,決定論的コンパイルによる実行からプランニングを分離するものだ。実行時に自動回帰チェインに頼る代わりに、システムはまずプリミティブの固定レジストリ上で型付きJSONプランを生成し、明示的な構造的制約と型制約に対してその計画を検証し、検証済みのプランのみを実行可能なPythonにコンパイルする。ワークフローの深度、SQLラウンドトリップの持続性、スキーマをテーマとしたストレステストなどについて、300タスクのベンチマークでアプローチを評価した。この設定では、PlanCompilerは、セットAとBで100%ファーストパス成功、セットCで88%、セットDで96%、スキーマトラップタスクで88%、SQLラウンドトリップタスクで84%、GPT-4.1とClaude Sonnetの5つのベンチマークセットで直接フリーフォームコード生成ベースラインを上回り、合計278/300の成功を2つのベースラインで202/300と187/300に対して達成している。全スイート全体では、計画コストは0.356ドル、GPT-4.1は2.140ドル、Claudeは18.391ドルである。これらの結果は、レジストリに制約のある構造化データワークフローでは、決定論的コンパイルは、自由形式のコード生成と比較して、ファーストパスの信頼性とコスト効率を向上させることができることを示唆している。残余の失敗は2つの狭いクラスに集中している: 集約タスクにおける遅延出力競合エラーと、SQLiteの永続化境界における初期型ミスマッチであり、アプローチの利点と現在の限界の両方を明確にしている。

論文の概要: PlanCompiler: A Deterministic Compilation Architecture for Structured Multi-Step LLM Pipelines

関連論文リスト