Related papers: Migrating Code At Scale With LLMs At Google

Migrating Code At Scale With LLMs At Google

URL: http://arxiv.org/abs/2504.09691v1
Date: Sun, 13 Apr 2025 18:52:44 GMT
Title: Migrating Code At Scale With LLMs At Google
Authors: Celal Ziftci, Stoyan Nikolov, Anna Sjövall, Bo Kim, Daniele Codecasa, Max Kim,
Abstract summary: We discuss a large-scale, costly and traditionally manual migration project at Google.<n>We propose a novel automated algorithm that uses change location discovery and a Large Language Model (LLM) to aid developers conduct the migration.<n>Our results suggest that our automated, LLM-assisted workflow can serve as a model for similar initiatives.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Developers often evolve an existing software system by making internal changes, called migration. Moving to a new framework, changing implementation to improve efficiency, and upgrading a dependency to its latest version are examples of migrations. Migration is a common and typically continuous maintenance task undertaken either manually or through tooling. Certain migrations are labor intensive and costly, developers do not find the required work rewarding, and they may take years to complete. Hence, automation is preferred for such migrations. In this paper, we discuss a large-scale, costly and traditionally manual migration project at Google, propose a novel automated algorithm that uses change location discovery and a Large Language Model (LLM) to aid developers conduct the migration, report the results of a large case study, and discuss lessons learned. Our case study on 39 distinct migrations undertaken by three developers over twelve months shows that a total of 595 code changes with 93,574 edits have been submitted, where 74.45% of the code changes and 69.46% of the edits were generated by the LLM. The developers reported high satisfaction with the automated tooling, and estimated a 50% reduction on the total time spent on the migration compared to earlier manual migrations. Our results suggest that our automated, LLM-assisted workflow can serve as a model for similar initiatives.

Related papers

Using LLMs for Library Migration [1.9247157750972368]
Large Language Models (LLMs) are good at generating and transforming code and finding similar code. We evaluate three LLMs, LLama 3.1, GPT-4o mini, and GPT-4o on PyMigBench, where we migrate 321 real-world library migrations. LLama 3.1, GPT-4o mini, and GPT-4o correctly migrate 89%, 89%, and 94% of the migration-related code changes respectively.
arXiv Detail & Related papers (2025-04-17T18:32:48Z)
How is Google using AI for internal code migrations? [5.277315246731]
This article is an experience report on using LLMs for code migrations at Google.<n>Rather, we share our experiences in applying LLM-based code migration in an enterprise context.<n>We see evidence that the use of LLMs can reduce the time needed for migrations significantly.
arXiv Detail & Related papers (2025-01-12T23:06:25Z)
Example-Based Automatic Migration of Continuous Integration Systems [2.2836654317217326]
Continuous Integration (CI) is a widely adopted practice for faster code change integration and testing. Developers often migrate between CI systems in pursuit of features like matrix building or better logging. This migration is effort intensive and error-prone owing to limited knowledge of the new CI system and its syntax. We propose a novel approach for CI system's automatic migration: CIMig.
arXiv Detail & Related papers (2024-07-02T20:19:21Z)
DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning [56.887047551101574]
We present DS-Agent, a novel framework that harnesses large language models (LLMs) agent and case-based reasoning (CBR) In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle. In the deployment stage, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm, significantly reducing the demand on foundational capabilities of LLMs.
arXiv Detail & Related papers (2024-02-27T12:26:07Z)
A Generative AI Assistant to Accelerate Cloud Migration [2.9248916859490173]
The Cloud Migration LLM accepts input from the user specifying the parameters of their migration, and outputs a migration strategy with an architecture diagram. A user study suggests that the migration LLM can assist inexperienced users in finding the right cloud migration profile, while avoiding complexities of a manual approach.
arXiv Detail & Related papers (2024-01-03T14:13:24Z)
Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development. We introduce Experiential Co-Learning, a novel LLM-agent learning framework. Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z)
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks. To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)
FireAct: Toward Language Agent Fine-tuning [63.06306936820456]
We argue for the overlooked direction of fine-tuning LMs to obtain language agents. Fine-tuning Llama2-7B with 500 agent trajectories generated by GPT-4 leads to a 77% HotpotQA performance increase. We propose FireAct, a novel approach to fine-tuning LMs with trajectories from multiple tasks and prompting methods.
arXiv Detail & Related papers (2023-10-09T17:58:38Z)
Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains. In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z)
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages [86.90220551111096]
Training datasets for large language models (LLMs) are often not fully disclosed. We present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages.
arXiv Detail & Related papers (2023-09-17T23:49:10Z)
Interactive, Iterative, Tooled, Rule-Based Migration of Microsoft Access to Web Technologies [0.11650821883155184]
We are working on migrating Microsoft Access monolithic applications to the web front-end and producing back-end. To enable the developers to drive the migration to the target systems, we propose an Interactive, Iterative, Tooled, Rule-Based Migration approach.
arXiv Detail & Related papers (2023-09-07T06:46:28Z)
MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation. Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results. For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data. For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z)
Characterizing Python Library Migrations [2.2557806157585834]
We label 3,096 migration-related code changes in 335 Python library migrations. We find that 40% of library pairs have API mappings that involve non-function program elements. On average, a developer needs to learn about 4 APIs and 2 API mappings to perform a migration.
arXiv Detail & Related papers (2022-07-03T21:00:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.