Fortran2CPP: Automating Fortran-to-C++ Translation using LLMs via Multi-Turn Dialogue and Dual-Agent Integration
- URL: http://arxiv.org/abs/2412.19770v2
- Date: Fri, 31 Jan 2025 20:36:06 GMT
- Title: Fortran2CPP: Automating Fortran-to-C++ Translation using LLMs via Multi-Turn Dialogue and Dual-Agent Integration
- Authors: Le Chen, Bin Lei, Dunzhi Zhou, Pei-Hung Lin, Chunhua Liao, Caiwen Ding, Ali Jannesari,
- Abstract summary: Our dataset comprises 11.7k dialogues capturing feedback-decision including code translation, compilation, execution, unit testing, and error-fixing.
Using this dataset, we achieve up to a 3.31x improvement in CodeBLEU scores and a 92% increase in compilation success rate.
- Score: 10.985254527043429
- License:
- Abstract: Translating legacy Fortran code into C++ is a crucial step in modernizing high-performance computing (HPC) applications. However, the scarcity of high-quality, parallel Fortran-to-C++ datasets and the limited domain-specific expertise in large language models (LLMs) present significant challenges for automated translation. In this paper, we introduce Fortran2CPP, a multi-turn dialogue dataset generated by a novel LLM agent-based approach that integrates a dual-LLM Questioner-Solver module to enhance translation accuracy. Our dataset comprises 11.7k dialogues capturing iterative feedback-decision workflows including code translation, compilation, execution, unit testing, and error-fixing. Using this dataset, we fine-tune several open-weight LLMs and achieve up to a 3.31x improvement in CodeBLEU scores and a 92\% increase in compilation success rate, demonstrating enhanced syntactic accuracy and functional reliability. Our findings highlight the value of dialogue-based LLM training for complex code translation tasks. The dataset and model have been open-sourced and are available on our public GitHub repository\footnote{\url{https://github.com/HPC-Fortran2CPP/Fortran2Cpp}}.
Related papers
- LLM-AutoDiff: Auto-Differentiate Any LLM Workflow [58.56731133392544]
We introduce LLM-AutoDiff: a novel framework for Automatic Prompt Engineering (APE)
LLMs-AutoDiff treats each textual input as a trainable parameter and uses a frozen backward engine to generate feedback-akin to textual gradients.
It consistently outperforms existing textual gradient baselines in both accuracy and training cost.
arXiv Detail & Related papers (2025-01-28T03:18:48Z) - Enhancing Cross-Language Code Translation via Task-Specific Embedding Alignment in Retrieval-Augmented Generation [1.64043572114825]
We introduce a novel method to enhance cross-language code translation from Fortran to C++ by integrating task-specific embedding alignment.
Our strategy aligns the retrieval model directly with the objective of maximizing translation quality, as quantified by the CodeBLEU metric.
By integrating these CodeBLEU-optimized embeddings into the RAG framework, our approach significantly enhances both retrieval accuracy and code generation quality.
arXiv Detail & Related papers (2024-12-06T16:22:32Z) - Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing [0.9668407688201359]
generative artificial intelligence (GenAI) is poised to transform productivity in scientific computing.
We developed a tool, CodeScribe, which combines prompt engineering with user supervision to establish an efficient process for code conversion.
We also address the challenges of AI-driven code translation and highlight its benefits for enhancing productivity in scientific computing.
arXiv Detail & Related papers (2024-10-31T16:48:41Z) - CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming [15.391781573025787]
We introduce CodeRosetta, an encoder-decoder model designed specifically for translating between programming languages and their HPC extensions.
CodeRosetta is evaluated on C++ to parallel C++ translation tasks.
Our results show that CodeRosetta outperforms state-of-the-art baselines in C++ to translation.
arXiv Detail & Related papers (2024-10-27T17:34:07Z) - IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [49.903001442804594]
This work investigates the prospect of leveraging compiler intermediate representations (IR) to improve the multilingual capabilities of Code-LMs.
We first compile SLTrans, a parallel dataset consisting of nearly 4M self-contained source code files.
Next, we carry out continued causal language modelling training on SLTrans, forcing the Code-LMs to learn the IR language.
Our resulting models, dubbed IRCoder, display sizeable and consistent gains across a wide variety of code generation tasks and metrics.
arXiv Detail & Related papers (2024-03-06T17:52:08Z) - PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
Models for PowerPoint Task Completion [96.47420221442397]
We construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels.
We test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates robustness settings.
We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark.
arXiv Detail & Related papers (2024-03-06T15:33:32Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - L2MAC: Large Language Model Automatic Computer for Extensive Code Generation [52.81694565226513]
Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture.
This paper presents L2MAC, the first practical LLM-based general-purpose stored-program automatic computer (von Neumann architecture) framework, for long and consistent output generation.
arXiv Detail & Related papers (2023-10-02T16:55:19Z) - Creating a Dataset for High-Performance Computing Code Translation using
LLMs: A Bridge Between OpenMP Fortran and C++ [7.872005563259838]
The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods.
Models without prior coding knowledge experienced a boost of $mathbftimes5.1$ in CodeBLEU scores.
Models with some coding familiarity saw an impressive $mathbftimes9.9$-fold increase.
arXiv Detail & Related papers (2023-07-15T02:35:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.