Related papers: Algorithm-Based Pipeline for Reliable and Intent-Preserving Code Translation with LLMs

Algorithm-Based Pipeline for Reliable and Intent-Preserving Code Translation with LLMs

URL: http://arxiv.org/abs/2602.16106v1
Date: Wed, 18 Feb 2026 00:34:29 GMT
Title: Algorithm-Based Pipeline for Reliable and Intent-Preserving Code Translation with LLMs
Authors: Shahriar Rumi Dipto, Saikat Mondal, Chanchal K. Roy,
Abstract summary: Direct one-shot translation often fails to preserve program intent, leading to errors in control flow, type handling, and I/O behavior.<n>We propose an algorithm-based pipeline that introduces a language-neutral intermediate specification to capture these details before code generation.
Score: 3.4257278503723576
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Code translation, the automatic conversion of programs between languages, is a growing use case for Large Language Models (LLMs). However, direct one-shot translation often fails to preserve program intent, leading to errors in control flow, type handling, and I/O behavior. We propose an algorithm-based pipeline that introduces a language-neutral intermediate specification to capture these details before code generation. This study empirically evaluates the extent to which structured planning can improve translation accuracy and reliability relative to direct translation. We conduct an automated paired experiment - direct and algorithm-based to translate between Python and Java using five widely used LLMs on the Avatar and CodeNet datasets. For each combination (model, dataset, approach, and direction), we compile and execute the translated program and run the tests provided. We record compilation results, runtime behavior, timeouts (e.g., infinite loop), and test outcomes. We compute accuracy from these tests, counting a translation as correct only if it compiles, runs without exceptions or timeouts, and passes all tests. We then map every failed compile-time and runtime case to a unified, language-aware taxonomy and compare subtype frequencies between the direct and algorithm-based approaches. Overall, the Algorithm-based approach increases micro-average accuracy from 67.7% to 78.5% (10.8% increase). It eliminates lexical and token errors by 100%, reduces incomplete constructs by 72.7%, and structural and declaration issues by 61.1%. It also substantially lowers runtime dependency and entry-point failures by 78.4%. These results demonstrate that algorithm-based pipelines enable more reliable, intent-preserving code translation, providing a foundation for robust multilingual programming assistants.

Related papers

Anka: A Domain-Specific Language for Reliable LLM Code Generation [0.0]
Large Language Models (LLMs) exhibit systematic errors on complex, multi-step programming tasks.<n>We introduce Anka, a domain-specific language () for data transformation pipelines designed with explicit, constrained syntax.<n>Anka achieves 99.9% parse success and 95.8% overall task accuracy across 100 benchmark problems.
arXiv Detail & Related papers (2025-12-29T05:28:17Z)
Enhancing LLMs in Long Code Translation through Instrumentation and Program State Alignment [0.0]
Code translation aims to transform code between programming languages while preserving functionality.<n>Recent advances in Large Language Models (LLMs) have improved code translation, but challenges remain.
arXiv Detail & Related papers (2025-04-02T13:55:29Z)
EquiBench: Benchmarking Large Language Models' Reasoning about Program Semantics via Equivalence Checking [58.15568681219339]
We introduce EquiBench, a new benchmark for evaluating large language models (LLMs)<n>This task directly tests a model's ability to reason about program semantics.<n>We evaluate 19 state-of-the-art LLMs and find that in the most challenging categories, the best accuracies are 63.8% and 76.2%, only modestly above the 50% random baseline.
arXiv Detail & Related papers (2025-02-18T02:54:25Z)
Guided Debugging of Auto-Translated Code Using Differential Testing [9.897793495754225]
tHinter is a tool to locate translation errors in auto-translated code.<n>It employs fuzzing to generate diverse test cases that thoroughly explore the translated code.<n>It then relies on a algorithm to pinpoint translation errors from coverage information and differential testing execution results.
arXiv Detail & Related papers (2025-01-16T11:27:25Z)
Enhancing Cross-Language Code Translation via Task-Specific Embedding Alignment in Retrieval-Augmented Generation [1.64043572114825]
We introduce a novel method to enhance cross-language code translation from Fortran to C++ by integrating task-specific embedding alignment.<n>Our strategy aligns the retrieval model directly with the objective of maximizing translation quality, as quantified by the CodeBLEU metric.<n>By integrating these CodeBLEU-optimized embeddings into the RAG framework, our approach significantly enhances both retrieval accuracy and code generation quality.
arXiv Detail & Related papers (2024-12-06T16:22:32Z)
Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping [60.458273797431836]
Decoding by contrasting layers (DoLa) is designed to improve the generation quality of large language models. We find that this approach does not work well on non-English tasks. Inspired by previous interpretability work on language transition during the model's forward pass, we propose an improved contrastive decoding algorithm.
arXiv Detail & Related papers (2024-07-15T15:14:01Z)
The Consensus Game: Language Model Generation via Equilibrium Search [73.51411916625032]
We introduce a new, a training-free, game-theoretic procedure for language model decoding. Our approach casts language model decoding as a regularized imperfect-information sequential signaling game. Applying EQUILIBRIUM-RANKING to LLaMA-7B outperforms the much larger LLaMA-65B and PaLM-540B models.
arXiv Detail & Related papers (2023-10-13T14:27:21Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results. LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)
Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent. It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics. We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z)
Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection. We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z)
Leveraging Automated Unit Tests for Unsupervised Code Translation [34.84910520660154]
We propose to leverage an automated unit-testing system to filter out invalid translations. We find that fine-tuning an unsupervised model with this filtered data set significantly reduces the noise in the translations so-generated. In particular, for Java $to$ Python and Python $to$ C++ we outperform the best previous methods by more than 16% and 24% respectively.
arXiv Detail & Related papers (2021-10-13T15:08:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.