Fugu-MT 論文翻訳(概要): CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation

論文の概要: CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation

arxiv url: http://arxiv.org/abs/2504.10046v1
Date: Mon, 14 Apr 2025 09:51:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-04-22 19:21:35.472132
Title: CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation
Title（参考訳）: CodeRAG: リアルタイムコード生成のためのBigraph上のサポートコード検索
Authors: Jia Li, Xianjie Shi, Kechi Zhang, Lei Li, Ge Li, Zhengwei Tao, Jia Li, Fang Liu, Chongyang Tao, Zhi Jin,
Abstract要約: 大規模言語モデル(LLM)は、自動コード生成において有望なパフォーマンスを示している。本稿では,検索拡張コード生成フレームワークであるCodeRAGを提案する。実験によると、CodeRAGはRAGのシナリオと比較して大幅に改善されている。
参考スコア（独自算出の注目度）: 69.684886175768
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have shown promising performance in automated code generation, especially excelling in simple tasks such as generating standalone codes. Different from simple tasks, real-world code generation usually depends on specific programming environment (e.g., code repositories). It contains complex dependencies and domain knowledge, which is needed for LLMs when generating target code snippets. In this paper, we propose CodeRAG, a retrieval-augmented code generation (RAG) framework to comprehensively retrieve supportive codes for real-world code generation. Beginning with the requirement, CodeRAG first constructs a requirement graph for the current repository, and retrieves sub- and similar- requirement nodes of the target requirement on the graph. Meanwhile, it models the repository into a DS-code graph. CodeRAG then maps these relevant requirement nodes into their corresponding code nodes, and treats these code nodes as archors for LLM reasoning on DS-code graph. Finally, CodeRAG introduces a code-oriented agentic reasoning process, seamlessly allowing LLMs to reason and comprehensively retrieve for supportive codes which LLMs' need for generating correct programs. Experiments show that CodeRAG achieves significant improvements (i.e., increasing 40.90 and 37.79 Pass@1 on GPT-4o and Gemini-Pro on DevEval) compared to no RAG scenarios. Further tests on reasoning LLMs (i.e., QwQ-32B) confirm CodeRAG's adaptability and efficacy across various types of LLMs. In addition, CodeRAG outperforms commercial programming products such as Copilit and Cursor. We further investigate the performance of our framework on different dependency types, and observe that CodeRAG is superior in generating examples where target codes invoke predefined cross-file code snippets. These results demonstrate CodeRAG's potential in solving real-world repo-level coding challenges.
Abstract（参考訳）: 大規模言語モデル(LLM)は、自動コード生成において有望なパフォーマンスを示しており、特にスタンドアロンコード生成のような単純なタスクに優れています。単純なタスクとは異なり、実際のコード生成は通常、特定のプログラミング環境(例えば、コードリポジトリ)に依存します。複雑な依存関係とドメイン知識が含まれており、ターゲットのコードスニペットを生成する際に LLM に必要なものである。本稿では,実世界のコード生成支援コードを包括的に検索する検索拡張コード生成(RAG)フレームワークであるCodeRAGを提案する。要件から始めると、CodeRAGはまず現在のリポジトリの要件グラフを構築し、そのグラフ上のターゲット要件のサブと類似の要件ノードを検索する。一方、リポジトリをDSコードグラフにモデル化する。 CodeRAGは、これらの関連する要求ノードを対応するコードノードにマッピングし、DS-code graph上のLSM推論のアーケータとして扱う。最後に、CodeRAGはコード指向のエージェント推論プロセスを導入し、LSMが正しいプログラムを生成するために必要なサポートコードに対して、LCMをシームレスに論理的かつ包括的に検索できるようにする。実験の結果、CodeRAGはRAGのシナリオと比較して大幅に改善されている(GPT-4oでは40.90と37.79 Pass@1、DevEvalではGemini-Pro)。推理LSM(QwQ-32B)のさらなる試験は、CodeRAGの様々なタイプのLSMに対する適応性と有効性を確認する。さらに、CodeRAGはCopilitやCursorといった商用プログラミング製品よりも優れている。さらに、異なる依存型に対するフレームワークの性能について検討し、CodeRAGが予め定義されたクロスファイルコードスニペットを呼び出すサンプルを生成するのに優れていることを観察する。これらの結果は,実世界のリポジトリレベルのコーディング課題を解決するCodeRAGの可能性を示している。

論文の概要: CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation

関連論文リスト