Related papers: Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes

Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes

URL: http://arxiv.org/abs/2602.05780v1
Date: Thu, 05 Feb 2026 15:38:54 GMT
Title: Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes
Authors: Ulrich Finkler, Irene Manotas, Wei Zhang, Geert Janssen, Octavian Popescu, Shyam Ramji,
Abstract summary: We present our approach for automated LLM customization based on semantic scopes in the code.<n>Our mechanism for ingesting the repository's data and formulating the training data pairs with semantic scopes helps models to learn the underlying patterns specific to the repository.<n>The code completions of moderately sized customized models can be significantly better than those of uncustomized models of much larger capacity.
Score: 3.2942861117920916
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Code completion (CC) is a task frequently used by developers when working in collaboration with LLM-based programming assistants. Despite the increased performance of LLMs on public benchmarks, out of the box LLMs still have a hard time generating code that aligns with a private code repository not previously seen by the model's training data. Customizing code LLMs to a private repository provides a way to improve the model performance. In this paper we present our approach for automated LLM customization based on semantic scopes in the code. We evaluate LLMs on real industry cases with two private enterprise code repositories with two customization strategies: Retrieval-Augmented Generation (RAG) and supervised Fine-Tuning (FT). Our mechanism for ingesting the repository's data and formulating the training data pairs with semantic scopes helps models to learn the underlying patterns specific to the repository, providing more precise code to developers and helping to boost their productivity. The code completions of moderately sized customized models can be significantly better than those of uncustomized models of much larger capacity. We also include an analysis of customization on two public benchmarks and present opportunities for future work.

Related papers

Evaluating and Achieving Controllable Code Completion in Code LLM [89.64782747840225]
We present the first instruction-guided code completion benchmark, Controllable Code Completion Benchmark (C3-Bench)<n>We reveal substantial gaps in instruction-following capabilities between open-source and advanced proprietary models during code completion tasks.<n>The resulting model, Qwen2.5-Coder-C3, achieves state-of-the-art performance on C3-Bench.
arXiv Detail & Related papers (2026-01-22T11:40:04Z)
Evaluating Large Language Models for Functional and Maintainable Code in Industrial Settings: A Case Study at ASML [3.5515013986822073]
We present a case study conducted in collaboration with the leveling department at A.<n>We investigate the performance of LLMs in generating functional, maintainable code within a closed, highly specialized software environment.<n>The findings reveal that prompting techniques and model size have a significant impact on output quality.
arXiv Detail & Related papers (2025-09-15T19:39:26Z)
On LLM-Assisted Generation of Smart Contracts from Business Processes [0.08192907805418582]
Large language models (LLMs) have changed the reality of how software is produced.<n>We present an exploratory study to investigate the use of LLMs for generating smart contract code from business process descriptions.<n>Our results show that LLM performance falls short of the perfect reliability required for smart contract development.
arXiv Detail & Related papers (2025-07-30T20:39:45Z)
Seed-Coder: Let the Code Model Curate Data for Itself [41.69830870792126]
We introduce Seed-Coder, a series of open-source models of 8B size.<n>Our code pretraining data is produced by a model-centric data pipeline.<n>Seed-Coder achieves state-of-the-art results among open-source models of similar size.
arXiv Detail & Related papers (2025-06-04T03:17:19Z)
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks [42.79558714652442]
Large Language Models (LLMs) have shown promise in function-level code generation, yet repository-level software engineering tasks remain challenging.<n>This paper investigates whether open-source LLMs can effectively address repository-level tasks without requiring agent-based approaches.<n>We introduce Code Graph Models (CGMs), which integrate repository code graph structures into the LLM's attention mechanism.
arXiv Detail & Related papers (2025-05-22T17:00:55Z)
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation [26.14778133391999]
FEA-Bench is a benchmark designed to assess the ability of large language models to perform incremental development within code repositories.<n>We collect pull requests from 83 GitHub repositories and use rule-based and intent-based filtering to construct task instances focused on new feature development.
arXiv Detail & Related papers (2025-03-09T16:11:57Z)
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [76.59316249991657]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.<n>While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.<n>We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z)
Retrieval-augmented code completion for local projects using large language models [0.0]
We train two open transformer-based models, the generative GPT-2 and the retrieval-adapted RETRO, on open-source Python files.<n>We improve our models' performance with In-context retrieval-augmented generation (RAG), which retrieves code snippets using the Jaccard similarity of tokens.<n> Experimental results indicate that In-context RAG improves the code completion baseline by over 26%, while RETRO improves over the similarly sized GPT-2 baseline by 12%.
arXiv Detail & Related papers (2024-08-09T12:26:57Z)
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data [64.69872638349922]
We present AlchemistCoder, a series of Code LLMs with enhanced code generation and generalization capabilities fine-tuned on multi-source data. We propose incorporating the data construction process into the fine-tuning data as code comprehension tasks, including instruction evolution, data filtering, and code review.
arXiv Detail & Related papers (2024-05-29T16:57:33Z)
Exploring Data-Efficient Adaptation of Large Language Models for Code Generation [64.5583894165813]
We propose a novel adaptation approach named DEED, which stands for Data-Efficient adaptation with Error-Driven learning for code generation.<n> Experimental results show that, compared to other mainstream fine-tuning approaches, DEED achieves superior performance with few training data.
arXiv Detail & Related papers (2024-02-29T16:09:02Z)
Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs) We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z)
LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.