Fugu-MT 論文翻訳(概要): To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

論文の概要: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

arxiv url: http://arxiv.org/abs/2603.15159v1
Date: Mon, 16 Mar 2026 11:53:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 18:28:58.191148
Title: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
Title（参考訳）: 習得すべきでないこと:LLMにコード生成にプライベートライブラリを使うように教える
Authors: Yitong Zhang, Chengze Li, Ruize Chen, Guowei Yang, Xiaoran Jia, Yijie Ren, Jia Li,
Abstract要約: PriCoderは、大規模言語モデルに自動合成データを通じてプライベートライブラリAPIを呼び出すように教えるアプローチである。 PriCoderはプライベートライブラリ指向のコード生成を大幅に改善し、多くの設定でpass@1で20%以上のゲインを得る。
参考スコア（独自算出の注目度）: 10.540200819270359
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have shown strong potential for code generation, yet they remain limited in private-library-oriented code generation, where the goal is to generate code using APIs from private libraries. Existing approaches mainly rely on retrieving private-library API documentation and injecting relevant knowledge into the context at inference time. However, our study shows that this is insufficient: even given accurate required knowledge, LLMs still struggle to invoke private-library APIs effectively. To address this limitation, we propose PriCoder, an approach that teaches LLMs to invoke private-library APIs through automatically synthesized data. Specifically, PriCoder models private-library data synthesis as the construction of a graph, and alternates between two graph operators: (1) Progressive Graph Evolution, which improves data diversity by progressively synthesizing more diverse training samples from basic ones, and (2) Multidimensional Graph Pruning, which improves data quality through a rigorous filtering pipeline. To support rigorous evaluation, we construct two new benchmarks based on recently released libraries that are unfamiliar to the tested models. Experiments on three mainstream LLMs show that PriCoder substantially improves private-library-oriented code generation, yielding gains of over 20% in pass@1 in many settings, while causing negligible impact on general code generation capability. Our code and benchmarks are publicly available at https://github.com/contact-eniacode/PriCoder.
Abstract（参考訳）: 大きな言語モデル(LLM)は、コード生成の可能性を強く示していますが、プライベートライブラリ指向のコード生成には制限があります。既存のアプローチは主に、プライベートライブラリAPIドキュメントの取得と、推論時にコンテキストに関連知識を注入することに依存しています。しかし、我々の研究は、これは不十分であることを示している。正確な必要な知識を考慮に入れたとしても、LLMは依然として、プライベートライブラリAPIを効果的に呼び出すのに苦労している。この制限に対処するために、自動で合成されたデータを通じてプライベートライブラリAPIを呼び出すことをLLMに教えるアプローチであるPriCoderを提案する。具体的には、PriCoderは、グラフの構成としてプライベートライブラリデータ合成をモデル化し、2つのグラフ演算子を交互に置き換える。(1) 基本からより多様なトレーニングサンプルを段階的に合成することでデータの多様性を向上させるプログレッシブグラフ進化、(2) 厳密なフィルタリングパイプラインを通じてデータ品質を改善する多次元グラフ解析。厳密な評価を支援するため、テストされたモデルに不慣れな最近リリースされたライブラリに基づいた2つの新しいベンチマークを構築した。 3つの主要なLCMでの実験では、PriCoderはプライベートライブラリ指向のコード生成を大幅に改善し、多くの設定でpass@1で20%以上向上し、一般的なコード生成能力に無視できない影響を与えている。私たちのコードとベンチマークはhttps://github.com/contact-eniacode/PriCoder.comで公開されています。

論文の概要: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

関連論文リスト