Related papers: Fortran2CPP: Automating Fortran-to-C++ Translation using LLMs via Multi-Turn Dialogue and Dual-Agent Integration

Fortran2CPP: Automating Fortran-to-C++ Translation using LLMs via Multi-Turn Dialogue and Dual-Agent Integration

URL: http://arxiv.org/abs/2412.19770v2
Date: Fri, 31 Jan 2025 20:36:06 GMT
Title: Fortran2CPP: Automating Fortran-to-C++ Translation using LLMs via Multi-Turn Dialogue and Dual-Agent Integration
Authors: Le Chen, Bin Lei, Dunzhi Zhou, Pei-Hung Lin, Chunhua Liao, Caiwen Ding, Ali Jannesari,
Abstract summary: Our dataset comprises 11.7k dialogues capturing feedback-decision including code translation, compilation, execution, unit testing, and error-fixing.<n>Using this dataset, we achieve up to a 3.31x improvement in CodeBLEU scores and a 92% increase in compilation success rate.
Score: 10.985254527043429
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Translating legacy Fortran code into C++ is a crucial step in modernizing high-performance computing (HPC) applications. However, the scarcity of high-quality, parallel Fortran-to-C++ datasets and the limited domain-specific expertise in large language models (LLMs) present significant challenges for automated translation. In this paper, we introduce Fortran2CPP, a multi-turn dialogue dataset generated by a novel LLM agent-based approach that integrates a dual-LLM Questioner-Solver module to enhance translation accuracy. Our dataset comprises 11.7k dialogues capturing iterative feedback-decision workflows including code translation, compilation, execution, unit testing, and error-fixing. Using this dataset, we fine-tune several open-weight LLMs and achieve up to a 3.31x improvement in CodeBLEU scores and a 92\% increase in compilation success rate, demonstrating enhanced syntactic accuracy and functional reliability. Our findings highlight the value of dialogue-based LLM training for complex code translation tasks. The dataset and model have been open-sourced and are available on our public GitHub repository\footnote{\url{https://github.com/HPC-Fortran2CPP/Fortran2Cpp}}.

Related papers

LLM-Assisted Translation of Legacy FORTRAN Codes to C++: A Cross-Platform Study [38.73914653312889]
Large Language Models (LLMs) are increasingly being leveraged for generating and translating scientific computer codes. Here, we studied the applicability of LLM-based translation of Fortran to C++ as a step towards building an agentic-workflow. We statistically quantified the compilation accuracy of the translated C++ codes, measured the similarity of the LLM translated code to the human translated C++ code, and statistically quantified the output similarity of the Fortran to C++ translation.
arXiv Detail & Related papers (2025-04-21T20:34:37Z)
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow [58.56731133392544]
We introduce LLM-AutoDiff: a novel framework for Automatic Prompt Engineering (APE) LLMs-AutoDiff treats each textual input as a trainable parameter and uses a frozen backward engine to generate feedback-akin to textual gradients. It consistently outperforms existing textual gradient baselines in both accuracy and training cost.
arXiv Detail & Related papers (2025-01-28T03:18:48Z)
Enhancing Cross-Language Code Translation via Task-Specific Embedding Alignment in Retrieval-Augmented Generation [1.64043572114825]
We introduce a novel method to enhance cross-language code translation from Fortran to C++ by integrating task-specific embedding alignment.<n>Our strategy aligns the retrieval model directly with the objective of maximizing translation quality, as quantified by the CodeBLEU metric.<n>By integrating these CodeBLEU-optimized embeddings into the RAG framework, our approach significantly enhances both retrieval accuracy and code generation quality.
arXiv Detail & Related papers (2024-12-06T16:22:32Z)
Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing [0.9668407688201359]
generative artificial intelligence (GenAI) is poised to transform productivity in scientific computing. We developed a tool, CodeScribe, which combines prompt engineering with user supervision to establish an efficient process for code conversion. We also address the challenges of AI-driven code translation and highlight its benefits for enhancing productivity in scientific computing.
arXiv Detail & Related papers (2024-10-31T16:48:41Z)
CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming [15.391781573025787]
We introduce CodeRosetta, an encoder-decoder model designed specifically for translating between programming languages and their HPC extensions. CodeRosetta is evaluated on C++ to parallel C++ translation tasks. Our results show that CodeRosetta outperforms state-of-the-art baselines in C++ to translation.
arXiv Detail & Related papers (2024-10-27T17:34:07Z)
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z)
IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [49.903001442804594]
This work investigates the prospect of leveraging compiler intermediate representations (IR) to improve the multilingual capabilities of Code-LMs. We first compile SLTrans, a parallel dataset consisting of nearly 4M self-contained source code files. Next, we carry out continued causal language modelling training on SLTrans, forcing the Code-LMs to learn the IR language. Our resulting models, dubbed IRCoder, display sizeable and consistent gains across a wide variety of code generation tasks and metrics.
arXiv Detail & Related papers (2024-03-06T17:52:08Z)
PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion [96.47420221442397]
We construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels. We test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates robustness settings. We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark.
arXiv Detail & Related papers (2024-03-06T15:33:32Z)
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation [30.053409671898933]
Kun is a novel approach for creating high-quality instruction-tuning datasets for large language models (LLMs) without relying on manual annotations. We leverage unlabelled data from diverse sources such as Wudao, Wanjuan, and SkyPile to generate a substantial dataset of over a million Chinese instructional data points.
arXiv Detail & Related papers (2024-01-12T09:56:57Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system. We build a novel data-cleaning pipeline that uses these principles to transform existing programs. We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z)
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks. To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)
L2MAC: Large Language Model Automatic Computer for Extensive Code Generation [52.81694565226513]
Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture. This paper presents L2MAC, the first practical LLM-based general-purpose stored-program automatic computer (von Neumann architecture) framework, for long and consistent output generation.
arXiv Detail & Related papers (2023-10-02T16:55:19Z)
Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C++ [7.872005563259838]
The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. Models without prior coding knowledge experienced a boost of $mathbftimes5.1$ in CodeBLEU scores. Models with some coding familiarity saw an impressive $mathbftimes9.9$-fold increase.
arXiv Detail & Related papers (2023-07-15T02:35:51Z)
Exploring Continual Learning for Code Generation Models [80.78036093054855]
Continual Learning (CL) is an important aspect that remains underexplored in the code domain. We introduce a benchmark called CodeTask-CL that covers a wide range of tasks, including code generation, translation, summarization, and refinement. We find that effective methods like Prompt Pooling (PP) suffer from catastrophic forgetting due to the unstable training of the prompt selection mechanism.
arXiv Detail & Related papers (2023-07-05T16:58:39Z)
Highly Parallel Autoregressive Entity Linking with Discriminative Correction [51.947280241185]
We propose a very efficient approach that parallelizes autoregressive linking across all potential mentions. Our model is >70 times faster and more accurate than the previous generative method.
arXiv Detail & Related papers (2021-09-08T17:28:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.