Fugu-MT 論文翻訳(概要): From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python

論文の概要: From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python

arxiv url: http://arxiv.org/abs/2604.11518v1
Date: Mon, 13 Apr 2026 14:21:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.597526
Title: From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python
Title（参考訳）: 翻訳からスーパーセットへ:RustからPythonへのプロダクションAIエージェントのベンチマーク駆動進化
Authors: Jinhua Wang, Biswa Sengupta,
Abstract要約: 本稿では,大規模な言語モデルでRustの目的をPythonに翻訳するLLM支援型連続コード翻訳手法を提案する。我々は、Pythonポートが59/80 SWE-bench検証タスク(73.8%)をRustの56/80(70.0%)に対して解決し、現実世界のエージェントタスクでほぼ同等であることを実証した。評価の結果,APIレイテンシが支配的な LLM ベースのエージェントでは,Python の表現性が 15.9 倍のコード削減を実現している。
参考スコア（独自算出の注目度）: 2.7324157162184157
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-language migration of large software systems is a persistent engineering challenge, particularly when the source codebase evolves rapidly. We present a methodology for LLM-assisted continuous code translation in which a large language model translates a production Rust codebase (648K LOC, 65 crates) into Python (41K LOC, 28 modules), with public agent benchmarks as the objective function driving iterative refinement. Our subject system is Codex CLI, a production AI coding agent. We demonstrate that: (1) the Python port resolves 59/80 SWE-bench Verified tasks (73.8%) versus Rust's 56/80 (70.0%), and achieves 42.5% on Terminal-Bench versus Rust's 47.5%, confirming near-parity on real-world agentic tasks; (2) benchmark-driven debugging, revealing API protocol mismatches, environment pollution, a silent WebSocket failure mode, and an API 400 crash, is more effective than static testing alone; (3) the architecture supports continuous upstream synchronisation via an LLM-assisted diff-translate-test loop; and (4) the Python port has evolved into a capability superset with 30 feature-flagged extensions (multi-agent orchestration, semantic memory, guardian safety, cost tracking) absent from Rust, while preserving strict parity mode for comparison. Our evaluation shows that for LLM-based agents where API latency dominates, Python's expressiveness yields a 15.9x code reduction with negligible performance cost, while the benchmark-as-objective-function methodology provides a principled framework for growing a cross-language port from parity into an extended platform.
Abstract（参考訳）: 大規模なソフトウェアシステムの言語間移行は、特にソースコードベースが急速に進化している場合、永続的なエンジニアリング上の課題である。本稿では,大規模な言語モデルを用いて,Rustのコードベース(648K LOC,65クラッド)をPython(41K LOC,28モジュール)に翻訳するLLM支援型連続コード翻訳手法を提案する。対象システムは、プロダクションAIコーディングエージェントであるCodex CLIである。 Pythonポートは、59/80 SWE-benchの検証タスク(73.8%)とRustの56/80(70.0%)を解決し、Contination-Bench対Rustの47.5%で42.5%を達成し、現実のエージェントタスクのニアパリティ確認、ベンチマーク駆動デバッグ、APIプロトコルのミスマッチ、環境汚染、サイレントなWebSocket障害モード、API 400クラッシュは、静的テストよりも効果的である。評価の結果,APIレイテンシが支配的な LLM ベースのエージェントでは,Python の表現性は無視可能なパフォーマンスコストで 15.9 倍のコード削減を実現している。

論文の概要: From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python

関連論文リスト