Fugu-MT 論文翻訳(概要): ReCUBE: Evaluating Repository-Level Context Utilization in Code Generation

論文の概要: ReCUBE: Evaluating Repository-Level Context Utilization in Code Generation

arxiv url: http://arxiv.org/abs/2603.25770v1
Date: Thu, 26 Mar 2026 08:04:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-30 21:49:48.202158
Title: ReCUBE: Evaluating Repository-Level Context Utilization in Code Generation
Title（参考訳）: ReCUBE:コード生成におけるリポジトリレベルのコンテキスト利用の評価
Authors: Jiseung Hong, Benjamin G. Ascoli, Jinho D. Choi,
Abstract要約: 大規模言語モデル(LLM)は、エージェント探索またはフルコンテキスト生成を通じて大規模で動作する有能なコーディングアシスタントとして登場した。 ReCUBEは,LLMが残されているすべてのソースファイル,依存関係仕様,ドキュメントをコンテキストの唯一のソースとして使用して,実世界のリポジトリ内でマスクされたファイルを再構築するベンチマークである。本稿では,エージェントフレームワークに統合可能な依存グラフベースのツールセットであるCaller-Centric Exploration (CCE)ツールキットを提案する。
参考スコア（独自算出の注目度）: 7.907933839674293
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have recently emerged as capable coding assistants that operate over large codebases through either agentic exploration or full-context generation. Existing benchmarks capture a broad range of coding capabilities, such as resolving GitHub issues, but none of them directly isolate and measure how effectively LLMs leverage repository-level context during code generation. To address this, we introduce ReCUBE, a benchmark in which LLMs reconstruct a masked file within a real-world repository, using all remaining source files, dependency specifications, and documentation as their only source of context. ReCUBE evaluates reconstructed code with usage-aware test cases that simulate both internal module logic and external cross-file integration, reflecting real-world software usage patterns. We further propose the Caller-Centric Exploration (CCE) toolkit, a set of dependency graph-based tools that can be integrated into agentic frameworks to guide agents toward the most relevant caller files during repository exploration. Experiments across eight models in four settings show that repository-level context utilization remains highly challenging even for state-of-the-art models, with GPT-5 achieving only 37.57% strict pass rate in the full-context setting. Agents augmented with our CCE toolkit consistently outperform all baselines across all evaluated models, with improvements of up to 7.56% in strict pass rate. We release our benchmark, code, and evaluation framework as open source for the NLP research community.
Abstract（参考訳）: 大規模言語モデル(LLM)は、エージェント探索またはフルコンテキスト生成を通じて大規模なコードベース上で動作する有能なコーディングアシスタントとして最近登場した。既存のベンチマークでは、GitHubの問題の解決など、幅広いコーディング機能をキャプチャしているが、LLMがコード生成中にリポジトリレベルのコンテキストをどのように効果的に活用するかを直接的に分離し、測定することはない。これを解決するために,LLMが残されているすべてのソースファイル,依存関係仕様,ドキュメントをコンテキストの唯一のソースとして使用して,実世界のリポジトリ内でマスクされたファイルを再構築するベンチマークであるReCUBEを紹介した。 ReCUBEは、実際のソフトウェア使用パターンを反映して、内部モジュールロジックと外部ファイル統合の両方をシミュレートする、使用を意識したテストケースで再構築されたコードを評価する。さらに,Caller-Centric Exploration (CCE)ツールキットを提案する。Caller-Centric Exploration (CCE)ツールキットは,リポジトリ探索中にエージェントを最も関連性の高い呼び出しファイルへ誘導するために,エージェントフレームワークに統合可能な依存性グラフベースのツールセットである。 4つの設定で8つのモデルにまたがる実験によると、リポジトリレベルのコンテキスト利用は最先端のモデルでも非常に困難であり、GPT-5はフルコンテキスト設定で37.57%の厳格なパス率しか達成していない。 CCEツールキットで強化されたエージェントは、すべての評価されたモデルで一貫してすべてのベースラインを上回り、厳格なパスレートで7.56%改善しました。 NLP研究コミュニティのためのオープンソースとして、ベンチマーク、コード、評価フレームワークをリリースします。

論文の概要: ReCUBE: Evaluating Repository-Level Context Utilization in Code Generation

関連論文リスト