Scalable, Validated Code Translation of Entire Projects using Large Language Models
- URL: http://arxiv.org/abs/2412.08035v1
- Date: Wed, 11 Dec 2024 02:31:46 GMT
- Title: Scalable, Validated Code Translation of Entire Projects using Large Language Models
- Authors: Hanliang Zhang, Cristina David, Meng Wang, Brandon Paulsen, Daniel Kroening,
- Abstract summary: Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code.
Existing works have shown a drop in translation success rates for code exceeding around 100 lines.
We develop a modular approach to translation, where we partition the code into small code fragments which can be independently translated.
We show that we can consistently generate reliable Rust for projects up to 6,600 lines of code and 369 functions, with an average of 73% of functions successfully validated for I/O equivalence.
- Score: 13.059046327936393
- License:
- Abstract: Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code. However, a significant limitation when using LLMs for code translation is scalability: existing works have shown a drop in translation success rates for code exceeding around 100 lines. We overcome this limitation by developing a modular approach to translation, where we partition the code into small code fragments which can be translated independently and semantically validated (that is, checking I/O equivalence). When this approach is applied naively, we discover that LLMs are unreliable when translating features of the source language that do not have a direct mapping to the target language, and that the LLM often gets stuck in repair loops when attempting to fix errors. To address these issues, we introduce two key concepts: (1) feature mapping, which integrates predefined translation rules with LLM-based translation to guide the LLM in navigating subtle language differences and producing semantically accurate code; and (2) type-compatibility, which facilitates localized checks at the function signature level to detect errors early, thereby narrowing the scope of potential repairs. We apply our approach to translating real-world Go codebases to Rust, demonstrating that we can consistently generate reliable Rust translations for projects up to 6,600 lines of code and 369 functions, with an average of 73% of functions successfully validated for I/O equivalence, considerably higher than any existing work.
Related papers
- Specification-Driven Code Translation Powered by Large Language Models: How Far Are We? [8.534857249221844]
We investigate using NL-specification as an intermediate representation for code translation.
Our results show that using NL-specification alone does not lead to performance improvements.
Besides analyzing the performance of code translation, we also investigate the quality of the translated code.
arXiv Detail & Related papers (2024-12-05T20:10:21Z) - TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation [16.46292795782835]
Code translation is crucial for software migration, system ablation, and cross-platform development.
Traditional rule-based methods rely on manually-written rules, which can be time-consuming and often result in less readable code.
More recently, the advance of Large Language Models (LLMs) further boosts learning-based code translation.
We propose a novel multi-agent system TRANSAGENT, which enhances LLM-based code translation by fixing the syntax errors and semantic errors.
arXiv Detail & Related papers (2024-09-30T02:53:03Z) - Rectifier: Code Translation with Corrector via LLMs [11.38401806203093]
We propose a general corrector, namely Rectifier, which is a micro and universal model for repairing translation errors.
The experimental results on translation tasks between C++, Java, and Python show that our model has effective repair ability.
arXiv Detail & Related papers (2024-07-10T08:58:41Z) - Towards Translating Real-World Code with LLMs: A Study of Translating to Rust [13.743967357458287]
Large language models (LLMs) show promise in code translation due to their ability to write code in most programming languages.
We conduct our study on code extracted from real-world open source projects.
FLOURINE is an end-to-end code translation tool that uses differential fuzzing to check if a Rust translation is I/O equivalent to the original source program.
arXiv Detail & Related papers (2024-05-19T10:54:03Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [49.903001442804594]
This work investigates the prospect of leveraging compiler intermediate representations (IR) to improve the multilingual capabilities of Code-LMs.
We first compile SLTrans, a parallel dataset consisting of nearly 4M self-contained source code files.
Next, we carry out continued causal language modelling training on SLTrans, forcing the Code-LMs to learn the IR language.
Our resulting models, dubbed IRCoder, display sizeable and consistent gains across a wide variety of code generation tasks and metrics.
arXiv Detail & Related papers (2024-03-06T17:52:08Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Lost in Translation: A Study of Bugs Introduced by Large Language Models
while Translating Code [5.915447908295047]
We present a large-scale empirical study to investigate the ability of general LLMs and code LLMs for code translation.
Our study involves the translation of 1,700 code samples from three benchmarks and two real-world projects.
We find that correct translations range from 2.1% to 47.3% for the studied LLMs.
arXiv Detail & Related papers (2023-08-06T13:33:13Z) - Chain-of-Dictionary Prompting Elicits Translation in Large Language Models [100.47154959254937]
Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT)
We present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities.
arXiv Detail & Related papers (2023-05-11T05:19:47Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.