Fugu-MT 論文翻訳(概要): Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes

論文の概要: Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes

arxiv url: http://arxiv.org/abs/2602.05780v1
Date: Thu, 05 Feb 2026 15:38:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-06 18:49:09.008881
Title: Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes
Title（参考訳）: セマンティックスコープを用いた企業コードリポジトリにおけるLCMの自動カスタマイズ
Authors: Ulrich Finkler, Irene Manotas, Wei Zhang, Geert Janssen, Octavian Popescu, Shyam Ramji,
Abstract要約: 本稿では,コード内のセマンティックスコープに基づいたLLM自動カスタマイズ手法を提案する。リポジトリのデータを取り込み、セマンティックスコープとトレーニングデータペアを定式化するメカニズムは、モデルがリポジトリ固有の基盤となるパターンを学習するのに役立ちます。適度にカスタマイズされたモデルのコード補完は、はるかに大きなキャパシティを持つ未カスタマイズモデルのコード補完よりも大幅に優れている。
参考スコア（独自算出の注目度）: 3.2942861117920916
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Code completion (CC) is a task frequently used by developers when working in collaboration with LLM-based programming assistants. Despite the increased performance of LLMs on public benchmarks, out of the box LLMs still have a hard time generating code that aligns with a private code repository not previously seen by the model's training data. Customizing code LLMs to a private repository provides a way to improve the model performance. In this paper we present our approach for automated LLM customization based on semantic scopes in the code. We evaluate LLMs on real industry cases with two private enterprise code repositories with two customization strategies: Retrieval-Augmented Generation (RAG) and supervised Fine-Tuning (FT). Our mechanism for ingesting the repository's data and formulating the training data pairs with semantic scopes helps models to learn the underlying patterns specific to the repository, providing more precise code to developers and helping to boost their productivity. The code completions of moderately sized customized models can be significantly better than those of uncustomized models of much larger capacity. We also include an analysis of customization on two public benchmarks and present opportunities for future work.
Abstract（参考訳）: コード補完 (CC) は、LLMベースのプログラミングアシスタントとの共同作業において、開発者が頻繁に使用するタスクである。公開ベンチマークでのLLMのパフォーマンスが向上したにもかかわらず、LLMはいまだにモデルのトレーニングデータでこれまで見られていなかったプライベートコードリポジトリと整合したコードを生成するのに苦労している。コードLLMをプライベートリポジトリにカスタマイズすることで、モデルのパフォーマンスを改善することができる。本稿では,コードのセマンティックスコープに基づいたLLM自動カスタマイズ手法を提案する。 2つのプライベートな企業コードリポジトリを持つ実業界におけるLCMの評価には,2つのカスタマイズ戦略 – Retrieval-Augmented Generation (RAG) と supervised Fine-Tuning (FT) がある。リポジトリのデータを取り込み、セマンティックスコープとトレーニングデータペアを定式化する当社のメカニズムは、モデルがリポジトリ固有のパターンを学習し、開発者により正確なコードを提供し、生産性を高めるのに役立ちます。適度にカスタマイズされたモデルのコード補完は、はるかに大きなキャパシティを持つ未カスタマイズモデルのコード補完よりも大幅に優れている。また、2つの公開ベンチマークにおけるカスタマイズの分析や、今後の作業の機会も含んでいます。

論文の概要: Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes

関連論文リスト