Fugu-MT 論文翻訳(概要): Bridging Code Graphs and Large Language Models for Better Code Understanding

論文の概要: Bridging Code Graphs and Large Language Models for Better Code Understanding

arxiv url: http://arxiv.org/abs/2512.07666v1
Date: Mon, 08 Dec 2025 16:00:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-09 22:03:54.955269
Title: Bridging Code Graphs and Large Language Models for Better Code Understanding
Title（参考訳）: コード理解のためのコードグラフと大規模言語モデルのブリッジ
Authors: Zeqi Chen, Zhaoyang Chu, Yi Gui, Feng Guo, Yao Wan, Chuan Shi,
Abstract要約: 大規模言語モデル(LLM)は、コード生成、要約、翻訳といったコードインテリジェンスタスクにおいて顕著なパフォーマンスを示している。本稿では,外部のトレーニング可能なブリッジモジュールを通じて,コードグラフ情報を用いたLLMを拡張可能なプラグイン・アンド・プレイ方式であるCGBridgeを提案する。実験により、CGBridgeはオリジナルのモデルとグラフ拡張プロンプト法の両方よりも顕著に改善されていることが示された。
参考スコア（独自算出の注目度）: 16.874601227080294
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable performance in code intelligence tasks such as code generation, summarization, and translation. However, their reliance on linearized token sequences limits their ability to understand the structural semantics of programs. While prior studies have explored graphaugmented prompting and structure-aware pretraining, they either suffer from prompt length constraints or require task-specific architectural changes that are incompatible with large-scale instructionfollowing LLMs. To address these limitations, this paper proposes CGBridge, a novel plug-and-play method that enhances LLMs with Code Graph information through an external, trainable Bridge module. CGBridge first pre-trains a code graph encoder via selfsupervised learning on a large-scale dataset of 270K code graphs to learn structural code semantics. It then trains an external module to bridge the modality gap among code, graph, and text by aligning their semantics through cross-modal attention mechanisms. Finally, the bridge module generates structure-informed prompts, which are injected into a frozen LLM, and is fine-tuned for downstream code intelligence tasks. Experiments show that CGBridge achieves notable improvements over both the original model and the graphaugmented prompting method. Specifically, it yields a 16.19% and 9.12% relative gain in LLM-as-a-Judge on code summarization, and a 9.84% and 38.87% relative gain in Execution Accuracy on code translation. Moreover, CGBridge achieves over 4x faster inference than LoRA-tuned models, demonstrating both effectiveness and efficiency in structure-aware code understanding.
Abstract（参考訳）: 大規模言語モデル(LLM)は、コード生成、要約、翻訳といったコードインテリジェンスタスクにおいて顕著なパフォーマンスを示している。しかしながら、線形化トークン列への依存は、プログラムの構造的意味論を理解する能力を制限する。従来の研究では、グラフ強化されたプロンプトと構造認識事前学習が検討されているが、それらは、プロンプト長制約に悩まされるか、大規模命令フォローLLMと互換性のないタスク固有のアーキテクチャ変更を必要とする。これらの制約に対処するために,外部のトレーニング可能なブリッジモジュールを通じて,コードグラフ情報によるLLMを強化する新しいプラグイン・アンド・プレイ方式であるCGBridgeを提案する。 CGBridgeはまず、構造的コードセマンティクスを学ぶために、270Kコードグラフの大規模なデータセットで自己教師付き学習を通じて、コードグラフエンコーダを事前トレーニングする。次に、外部モジュールをトレーニングして、コード、グラフ、テキスト間のモダリティギャップを、モーダル間のアテンションメカニズムを通じてセマンティクスを整列させることでブリッジする。最後に、ブリッジモジュールは構造インフォームドプロンプトを生成し、凍結LDMに注入され、下流のコードインテリジェンスタスクのために微調整される。実験により、CGBridgeはオリジナルのモデルとグラフ拡張プロンプト法の両方よりも顕著に改善されていることが示された。具体的には、コード要約におけるLLM-as-a-Judgeの16.19%と9.12%の相対的なゲインと、コード翻訳における実行精度の9.84%と38.87%の相対的なゲインが得られる。さらに、CGBridgeはLoRAチューニングモデルよりも4倍高速な推論を実現し、構造認識コード理解の有効性と効率性を実証している。

論文の概要: Bridging Code Graphs and Large Language Models for Better Code Understanding

関連論文リスト