Fugu-MT 論文翻訳(概要): JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

論文の概要: JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

arxiv url: http://arxiv.org/abs/2606.19830v2
Date: Sun, 21 Jun 2026 05:48:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 16:10:14.867458
Title: JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines
Title（参考訳）: JAMER: プロのゲームエンジン上でのプロジェクトレベルのコードフレームワークデータセットとベンチマーク
Authors: Jianwen Sun, Chuanhao Li, Zizhen Li, Yukang Feng, Fanrui Zhang, Yifei Huang, Yu Dai, Kaipeng Zhang,
Abstract要約: プロのゲームエンジン上に構築された最初のプロジェクトレベルのゲームコードフレームワークデータセットとベンチマークであるJamSetとJamBenchを紹介します。私たちの重要な洞察は、Game Jamコンペティションは、開発者が厳密な時間制約の下で完全なゲームを構築するコミュニティイベントであり、この目的に適した何千ものオープンソースプロジェクトを生み出します。
参考スコア（独自算出の注目度）: 38.658731944117875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current AI-driven game development has made substantial progress in asset generation, gameplay design, and web-based game coding, yet project-level code engineering on professional game engines remains largely unexplored due to the absence of large-scale datasets and deterministic evaluation methods. We present JamSet and JamBench, the first project-level game code framework dataset and benchmark built on a professional game engine. Our key insight is that Game Jam competitions, community events where developers build complete games under tight time constraints, yield thousands of open-source projects suitable for this purpose. Building on the Godot engine's text-based format and headless execution mode, we design a deterministic verification pipeline from file integrity to runtime behavior collection, distilling 8,133 verified projects from over 240,000 repositories. Of these, 300 manually verified projects form JamBench; the rest constitute JamSet. JamBench defines theme-driven generation and code completion tasks, evaluated through a pipeline combining compilation pass rates, Structural Completeness Score (SCS), and Behavioral Alignment Score (BAS). Evaluation of 9 frontier models reveals a capability cliff as project scale increases, with runtime pass rates dropping from 80.4% on small projects to 5.7% on large ones (Task2a). Code Agents improve compilation rates yet yield no gains in runtime behavioral quality, indicating that the bottleneck lies in architectural design rather than syntactic correctness. Experiments validate JamSet as effective training data. All data and code are publicly available.
Abstract（参考訳）: 現在のAIによるゲーム開発は、アセット生成、ゲームプレイ設計、Webベースのゲームコーディングにおいてかなりの進歩を遂げているが、大規模なデータセットや決定論的評価方法が欠如しているため、プロのゲームエンジン上でのプロジェクトレベルのコードエンジニアリングはほとんど探索されていない。プロのゲームエンジン上に構築された最初のプロジェクトレベルのゲームコードフレームワークデータセットとベンチマークであるJamSetとJamBenchを紹介します。私たちの重要な洞察は、Game Jamコンペティションは、開発者が厳密な時間制約の下で完全なゲームを構築するコミュニティイベントであり、この目的に適した何千ものオープンソースプロジェクトを生み出します。 Godotエンジンのテキストベースのフォーマットとヘッドレス実行モードに基づいて、ファイルの完全性から実行時の動作収集までの決定論的検証パイプラインを設計し、24万以上のリポジトリから8,133の検証プロジェクトを抽出しました。そのうち300のプロジェクトがJamBenchを形成し、残りはJamSetを構成している。 JamBench氏は、コンパイルパスレート、構造化完全性スコア(SCS)、行動整合スコア(BAS)を組み合わせたパイプラインを通じて評価される、テーマ駆動型生成とコード補完タスクを定義している。 9つのフロンティアモデルの評価では、プロジェクトの規模が拡大するにつれて、実行時のパスレートが80.4%から5.7%に低下する(Task2a)。コードエージェントはコンパイル率を改善するが、実行時の振る舞いの品質は向上しない。効果的なトレーニングデータとしてJamSetを検証する実験。すべてのデータとコードは公開されています。

論文の概要: JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

関連論文リスト