Guided Debugging of Auto-Translated Code Using Differential Testing
- URL: http://arxiv.org/abs/2501.09475v1
- Date: Thu, 16 Jan 2025 11:27:25 GMT
- Title: Guided Debugging of Auto-Translated Code Using Differential Testing
- Authors: Shengnan Wu, Xinyu Sun, Xin Wang, Yangfan Zhou,
- Abstract summary: tHinter is a tool to locate translation errors in auto-translated code.
It employs fuzzing to generate diverse test cases that thoroughly explore the translated code.
It then relies on a algorithm to pinpoint translation errors from coverage information and differential testing execution results.
- Score: 9.897793495754225
- License:
- Abstract: Large Language Models (LLMs) hold great promise in the task of code translation. However, the lack of explainability complicates the identification of the inevitable translation errors. In this paper, we propose tHinter, a debugging tool to locate translation errors in auto-translated code. The core idea of tHinter is that correctly translated, the source and translated code should present the same functionalities, giving the same output for the same input. Hence, lines in the translated code responsible for output differences are possibly translation errors. First, tHinter employs fuzzing to generate diverse test cases that thoroughly explore the translated code. Then, tHinter relies on a heuristic algorithm to pinpoint translation errors from coverage information and differential testing execution results of those test cases. This heuristic algorithm is designed to leverage both the statistics and the expertise of developers. Comprehensive experiments with real code show its effectiveness. It reduces 71% lines developers need to review during debugging and increases the likelihood of the LLM fixing translation errors in a single query by 59%. Developers generally consider it satisfactory and helpful.
Related papers
- Scalable, Validated Code Translation of Entire Projects using Large Language Models [13.059046327936393]
Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code.
Existing works have shown a drop in translation success rates for code exceeding around 100 lines.
We develop a modular approach to translation, where we partition the code into small code fragments which can be independently translated.
We show that we can consistently generate reliable Rust for projects up to 6,600 lines of code and 369 functions, with an average of 73% of functions successfully validated for I/O equivalence.
arXiv Detail & Related papers (2024-12-11T02:31:46Z) - Repository-Level Compositional Code Translation and Validation [5.269923665485903]
We propose AlphaTrans, a neuro-symbolic approach to automate repository-level code translation.
We leveraged AlphaTrans to translate ten real-world open-source projects consisting of 836, 8575, 2719> classes, methods, and tests.
99.1% of the translated code fragments are syntactically correct, and AlphaTrans validates the translations' runtime behavior and functional correctness for 25.8%.
arXiv Detail & Related papers (2024-10-31T16:46:52Z) - Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping [60.458273797431836]
Decoding by contrasting layers (DoLa) is designed to improve the generation quality of large language models.
We find that this approach does not work well on non-English tasks.
Inspired by previous interpretability work on language transition during the model's forward pass, we propose an improved contrastive decoding algorithm.
arXiv Detail & Related papers (2024-07-15T15:14:01Z) - Rectifier: Code Translation with Corrector via LLMs [11.38401806203093]
We propose a general corrector, namely Rectifier, which is a micro and universal model for repairing translation errors.
The experimental results on translation tasks between C++, Java, and Python show that our model has effective repair ability.
arXiv Detail & Related papers (2024-07-10T08:58:41Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - Mitigating Hallucinations and Off-target Machine Translation with
Source-Contrastive and Language-Contrastive Decoding [53.84948040596055]
We introduce two related methods to mitigate failure cases with a modified decoding objective.
Experiments on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that these methods suppress hallucinations and off-target translations.
arXiv Detail & Related papers (2023-09-13T17:15:27Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - ParroT: Translating during Chat using Large Language Models tuned with
Human Translation and Feedback [90.20262941911027]
ParroT is a framework to enhance and regulate the translation abilities during chat.
Specifically, ParroT reformulates translation data into the instruction-following style.
We propose three instruction types for finetuning ParroT models, including translation instruction, contrastive instruction, and error-guided instruction.
arXiv Detail & Related papers (2023-04-05T13:12:00Z) - Code Translation with Compiler Representations [21.702473137941006]
Traditional transpilers rely on syntactic information and handcrafted rules, which limits their applicability and produces unnatural-looking code.
Applying neural machine translation (NMT) approaches to code has successfully broadened the set of programs on which one can get a natural-looking translation.
Here we propose to augment code translation with IRs, specifically LLVM IR, with results on the C++, Java, Rust, and Go languages.
arXiv Detail & Related papers (2022-06-30T14:21:57Z) - Leveraging Automated Unit Tests for Unsupervised Code Translation [34.84910520660154]
We propose to leverage an automated unit-testing system to filter out invalid translations.
We find that fine-tuning an unsupervised model with this filtered data set significantly reduces the noise in the translations so-generated.
In particular, for Java $to$ Python and Python $to$ C++ we outperform the best previous methods by more than 16% and 24% respectively.
arXiv Detail & Related papers (2021-10-13T15:08:43Z) - How to Probe Sentence Embeddings in Low-Resource Languages: On
Structural Design Choices for Probing Task Evaluation [82.96358326053115]
We investigate sensitivity of probing task results to structural design choices.
We probe embeddings in a multilingual setup with design choices that lie in a'stable region', as we identify for English.
We find that results on English do not transfer to other languages.
arXiv Detail & Related papers (2020-06-16T12:37:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.