Related papers: Guided Debugging of Auto-Translated Code Using Differential Testing

Guided Debugging of Auto-Translated Code Using Differential Testing

URL: http://arxiv.org/abs/2501.09475v1
Date: Thu, 16 Jan 2025 11:27:25 GMT
Title: Guided Debugging of Auto-Translated Code Using Differential Testing
Authors: Shengnan Wu, Xinyu Sun, Xin Wang, Yangfan Zhou,
Abstract summary: tHinter is a tool to locate translation errors in auto-translated code.<n>It employs fuzzing to generate diverse test cases that thoroughly explore the translated code.<n>It then relies on a algorithm to pinpoint translation errors from coverage information and differential testing execution results.
Score: 9.897793495754225
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) hold great promise in the task of code translation. However, the lack of explainability complicates the identification of the inevitable translation errors. In this paper, we propose tHinter, a debugging tool to locate translation errors in auto-translated code. The core idea of tHinter is that correctly translated, the source and translated code should present the same functionalities, giving the same output for the same input. Hence, lines in the translated code responsible for output differences are possibly translation errors. First, tHinter employs fuzzing to generate diverse test cases that thoroughly explore the translated code. Then, tHinter relies on a heuristic algorithm to pinpoint translation errors from coverage information and differential testing execution results of those test cases. This heuristic algorithm is designed to leverage both the statistics and the expertise of developers. Comprehensive experiments with real code show its effectiveness. It reduces 71% lines developers need to review during debugging and increases the likelihood of the LLM fixing translation errors in a single query by 59%. Developers generally consider it satisfactory and helpful.

Related papers

Function-to-Style Guidance of LLMs for Code Translation [59.487054943812836]
We propose F2STrans, a function-to-style guiding paradigm designed to improve the performance of large language models in code translation.<n>Our approach comprises two key stages: (1) Functional learning, which optimize translation correctness using high-quality source-target code pairs.<n>We introduce a novel code translation benchmark that includes up-to-date source code, extensive test cases, and manually annotated ground-truth translations.
arXiv Detail & Related papers (2025-07-15T08:25:02Z)
Scalable, Validated Code Translation of Entire Projects using Large Language Models [13.059046327936393]
Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code. Existing works have shown a drop in translation success rates for code exceeding around 100 lines. We develop a modular approach to translation, where we partition the code into small code fragments which can be independently translated. We show that we can consistently generate reliable Rust for projects up to 6,600 lines of code and 369 functions, with an average of 73% of functions successfully validated for I/O equivalence.
arXiv Detail & Related papers (2024-12-11T02:31:46Z)
Repository-Level Compositional Code Translation and Validation [5.269923665485903]
We propose AlphaTrans, a neuro-symbolic approach to automate repository-level code translation. We leveraged AlphaTrans to translate ten real-world open-source projects consisting of 836, 8575, 2719> classes, methods, and tests. 99.1% of the translated code fragments are syntactically correct, and AlphaTrans validates the translations' runtime behavior and functional correctness for 25.8%.
arXiv Detail & Related papers (2024-10-31T16:46:52Z)
Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping [60.458273797431836]
Decoding by contrasting layers (DoLa) is designed to improve the generation quality of large language models. We find that this approach does not work well on non-English tasks. Inspired by previous interpretability work on language transition during the model's forward pass, we propose an improved contrastive decoding algorithm.
arXiv Detail & Related papers (2024-07-15T15:14:01Z)
Rectifier: Code Translation with Corrector via LLMs [11.38401806203093]
We propose a general corrector, namely Rectifier, which is a micro and universal model for repairing translation errors. The experimental results on translation tasks between C++, Java, and Python show that our model has effective repair ability.
arXiv Detail & Related papers (2024-07-10T08:58:41Z)
Exploring and Unleashing the Power of Large Language Models in Automated Code Translation [40.25727029618665]
This paper investigates diverse LLMs and learning-based transpilers for automated code translation tasks. UniTrans is a Unified code Translation framework, applicable to various LLMs. Three recent LLMs of diverse sizes are tested with UniTrans, and all achieve substantial improvements.
arXiv Detail & Related papers (2024-04-23T00:49:46Z)
Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages. Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs. In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z)
Machine Translation Models are Zero-Shot Detectors of Translation Direction [46.41883195574249]
Detecting the translation direction of parallel text has applications for machine translation training and evaluation, but also has forensic applications such as resolving plagiarism or forgery allegations. In this work, we explore an unsupervised approach to translation direction detection based on the simple hypothesis that $p(texttranslation|textoriginal)>p(textoriginal|texttranslation)$, motivated by the well-known simplification effect in translationese or machine-translationese.
arXiv Detail & Related papers (2024-01-12T18:59:02Z)
Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding [53.84948040596055]
We introduce two related methods to mitigate failure cases with a modified decoding objective. Experiments on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that these methods suppress hallucinations and off-target translations.
arXiv Detail & Related papers (2023-09-13T17:15:27Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
ParroT: Translating during Chat using Large Language Models tuned with Human Translation and Feedback [90.20262941911027]
ParroT is a framework to enhance and regulate the translation abilities during chat. Specifically, ParroT reformulates translation data into the instruction-following style. We propose three instruction types for finetuning ParroT models, including translation instruction, contrastive instruction, and error-guided instruction.
arXiv Detail & Related papers (2023-04-05T13:12:00Z)
Code Translation with Compiler Representations [21.702473137941006]
Traditional transpilers rely on syntactic information and handcrafted rules, which limits their applicability and produces unnatural-looking code. Applying neural machine translation (NMT) approaches to code has successfully broadened the set of programs on which one can get a natural-looking translation. Here we propose to augment code translation with IRs, specifically LLVM IR, with results on the C++, Java, Rust, and Go languages.
arXiv Detail & Related papers (2022-06-30T14:21:57Z)
Leveraging Automated Unit Tests for Unsupervised Code Translation [34.84910520660154]
We propose to leverage an automated unit-testing system to filter out invalid translations. We find that fine-tuning an unsupervised model with this filtered data set significantly reduces the noise in the translations so-generated. In particular, for Java $to$ Python and Python $to$ C++ we outperform the best previous methods by more than 16% and 24% respectively.
arXiv Detail & Related papers (2021-10-13T15:08:43Z)
How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation [82.96358326053115]
We investigate sensitivity of probing task results to structural design choices. We probe embeddings in a multilingual setup with design choices that lie in a'stable region', as we identify for English. We find that results on English do not transfer to other languages.
arXiv Detail & Related papers (2020-06-16T12:37:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.