Towards Translating Real-World Code with LLMs: A Study of Translating to Rust
- URL: http://arxiv.org/abs/2405.11514v2
- Date: Tue, 21 May 2024 13:25:45 GMT
- Title: Towards Translating Real-World Code with LLMs: A Study of Translating to Rust
- Authors: Hasan Ferit Eniser, Hanliang Zhang, Cristina David, Meng Wang, Maria Christakis, Brandon Paulsen, Joey Dodds, Daniel Kroening,
- Abstract summary: Large language models (LLMs) show promise in code translation due to their ability to write code in most programming languages.
We conduct our study on code extracted from real-world open source projects.
FLOURINE is an end-to-end code translation tool that uses differential fuzzing to check if a Rust translation is I/O equivalent to the original source program.
- Score: 13.743967357458287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) show promise in code translation - the task of translating code written in one programming language to another language - due to their ability to write code in most programming languages. However, LLM's effectiveness on translating real-world code remains largely unstudied. In this work, we perform the first substantial study on LLM-based translation to Rust by assessing the ability of five state-of-the-art LLMs, GPT4, Claude 3, Claude 2.1, Gemini Pro, and Mixtral. We conduct our study on code extracted from real-world open source projects. To enable our study, we develop FLOURINE, an end-to-end code translation tool that uses differential fuzzing to check if a Rust translation is I/O equivalent to the original source program, eliminating the need for pre-existing test cases. As part of our investigation, we assess both the LLM's ability to produce an initially successful translation, as well as their capacity to fix a previously generated buggy one. If the original and the translated programs are not I/O equivalent, we apply a set of automated feedback strategies, including feedback to the LLM with counterexamples. Our results show that the most successful LLM can translate 47% of our benchmarks, and also provides insights into next steps for improvements.
Related papers
- Language Models and Cycle Consistency for Self-Reflective Machine Translation [1.79487674052027]
We generate multiple translation candidates from a source language A to a target language B, and subsequently translate these candidates back to the original language A.
By evaluating the cycle consistency between the original and back-translated sentences using metrics such as token-level precision and accuracy, we implicitly estimate the translation quality in language B.
For each source sentence, we identify the translation candidate with optimal cycle consistency with the original sentence as the final answer.
arXiv Detail & Related papers (2024-11-05T04:01:41Z) - Unraveling the Potential of Large Language Models in Code Translation: How Far Are We? [4.616570111453259]
Large language models (LLMs) exhibit state-of-the-art performance in various tasks, but struggle for code translation.
We conduct a large-scale empirical study to exploit the capabilities and incapabilities of LLMs in code translation tasks.
We propose two methods: (1) intermediary translation which selects an intermediary language between the source and target ones; and (2) self-training which fine-tunes LLMs on self-generated parallel data.
arXiv Detail & Related papers (2024-10-13T12:20:12Z) - Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping [60.458273797431836]
Decoding by contrasting layers (DoLa) is designed to improve the generation quality of large language models.
We find that this approach does not work well on non-English tasks.
Inspired by previous interpretability work on language transition during the model's forward pass, we propose an improved contrastive decoding algorithm.
arXiv Detail & Related papers (2024-07-15T15:14:01Z) - TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - Lost in Translation: A Study of Bugs Introduced by Large Language Models
while Translating Code [5.915447908295047]
We present a large-scale empirical study to investigate the ability of general LLMs and code LLMs for code translation.
Our study involves the translation of 1,700 code samples from three benchmarks and two real-world projects.
We find that correct translations range from 2.1% to 47.3% for the studied LLMs.
arXiv Detail & Related papers (2023-08-06T13:33:13Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.