SafeTrans: LLM-assisted Transpilation from C to Rust
- URL: http://arxiv.org/abs/2505.10708v1
- Date: Thu, 15 May 2025 21:05:33 GMT
- Title: SafeTrans: LLM-assisted Transpilation from C to Rust
- Authors: Muhammad Farrukh, Smeet Shah, Baris Coskun, Michalis Polychronakis,
- Abstract summary: Rust is a strong contender for a memory-safe alternative to C as a "systems" programming language.<n>In this paper, we evaluate the potential of large language models (LLMs) to automate the transpilation of C code to Rust.<n>We present the design and implementation of SafeTrans, a framework that uses LLMs to i) transpile C code into Rust and ii) iteratively fix any compilation and runtime errors.
- Score: 5.6274106543826585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rust is a strong contender for a memory-safe alternative to C as a "systems" programming language, but porting the vast amount of existing C code to Rust is a daunting task. In this paper, we evaluate the potential of large language models (LLMs) to automate the transpilation of C code to idiomatic Rust, while ensuring that the generated code mitigates any memory-related vulnerabilities present in the original code. To that end, we present the design and implementation of SafeTrans, a framework that uses LLMs to i) transpile C code into Rust and ii) iteratively fix any compilation and runtime errors in the resulting code. A key novelty of our approach is the introduction of a few-shot guided repair technique for translation errors, which provides contextual information and example code snippets for specific error types, guiding the LLM toward the correct solution. Another novel aspect of our work is the evaluation of the security implications of the transpilation process, i.e., whether potential vulnerabilities in the original C code have been properly addressed in the translated Rust code. We experimentally evaluated SafeTrans with six leading LLMs and a set of 2,653 C programs accompanied by comprehensive unit tests, which were used for validating the correctness of the translated code. Our results show that our iterative repair strategy improves the rate of successful translations from 54% to 80% for the best-performing LLM (GPT-4o), and that all types of identified vulnerabilities in the original C code are effectively mitigated in the translated Rust code.
Related papers
- Context-Guided Decompilation: A Step Towards Re-executability [50.71992919223209]
Binary decompilation plays an important role in software security analysis, reverse engineering and malware understanding.<n>Recent advances in large language models (LLMs) have enabled neural decompilation, but the generated code is typically only semantically plausible.<n>We propose ICL4Decomp, a hybrid decompilation framework that leverages in-context learning (ICL) to guide LLMs toward generating re-executable source code.
arXiv Detail & Related papers (2025-11-03T17:21:39Z) - Project-Level C-to-Rust Translation via Synergistic Integration of Knowledge Graphs and Large Language Models [19.90921023222177]
Translating C code into safe Rust is an effective way to ensure its memory safety.<n>New C-Rust Pointer Knowledge Graph provides pointer semantics from a global perspective.<n>Our experiments show that ourtool reduces unsafe usages in translated Rust by 99.9%.
arXiv Detail & Related papers (2025-10-13T03:09:35Z) - RustAssure: Differential Symbolic Testing for LLM-Transpiled C-to-Rust Code [2.1413732583497826]
Rust is a memory-safe programming language that significantly improves software security.<n>Existings written in unsafe memory languages, such as C, must first be transpiled to Rust to take advantage of Rust's improved safety guarantees.<n>RustAssure presents a system that uses Large Language Models (LLMs) to automatically transpile existing Cs to Rust.
arXiv Detail & Related papers (2025-10-08T22:52:34Z) - Adversarial Agent Collaboration for C to Rust Translation [13.65848262530917]
ACToR translates all of the 63 real-world command line utilities considered in our benchmarks.<n>It achieves over 90% test pass rate with zero human intervention.<n>It is the first such system that reliably translates C programs of this scale.
arXiv Detail & Related papers (2025-10-04T17:08:36Z) - IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z) - D-LiFT: Improving LLM-based Decompiler Backend via Code Quality-driven Fine-tuning [49.16469288280772]
Decompilers reconstruct human-readable source code from binaries.<n>Despite recent advances, their outputs often suffer from syntactic and semantic errors and remain difficult to read.<n>With the advent of large language models (LLMs), researchers began to explore the potential of LLMs to refine decompiler output.<n>We present D-LIFT, an enhanced decompiler-LLM pipeline with fine-tuned reinforcement learning.
arXiv Detail & Related papers (2025-06-11T19:09:08Z) - CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation [63.23120252801889]
CRUST-Bench is a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases.<n>We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem.<n>The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting.
arXiv Detail & Related papers (2025-04-21T17:33:33Z) - RustMap: Towards Project-Scale C-to-Rust Migration via Program Analysis and LLM [13.584956125542396]
Rust offers superior memory safety while maintaining C's high performance.<n>Existing automated translation tools, such as C2Rust, may rely too much on syntactic, template-based translation.<n>This paper introduces a novel dependency-guided and large language model (LLM)-based C-to-Rust translation approach, RustMap.
arXiv Detail & Related papers (2025-03-22T11:57:45Z) - ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z) - ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation [57.604506522287814]
Existing large language models (LLMs) only learn the contextual semantics of code during pre-training.<n>We propose ExeCoder to utilize executability representations such as functional semantics, syntax structures, and variable dependencies.<n>We show that ExeCoder achieves state-of-the-art performance in code translation, surpassing existing open-source code LLMs by over 10.88% to 38.78% and over 27.44% to 42.97% on two metrics.
arXiv Detail & Related papers (2025-01-30T16:18:52Z) - Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis [8.361424157571468]
Syzygy is an automated approach to translate C to safe Rust.<n>This is the largest automated and test-validated C to safe Rust code translation achieved so far.
arXiv Detail & Related papers (2024-12-18T18:55:46Z) - Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models [1.8416014644193066]
Large language models (LLMs) show promise for automating this translation by generating more natural and safer code than rule-based methods.
We propose an LLM-based translation scheme that improves the success rate of translating large-scale C code into compilable Rust code.
In experiments with 20 benchmark C programs, including those exceeding 4 kilo lines of code, we successfully translated all programs into compilable Rust code.
arXiv Detail & Related papers (2024-09-16T17:52:36Z) - HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation.
Recent studies highlight that many LLM-generated code contains serious security vulnerabilities.
We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z) - VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners [6.824327908701066]
Rust is a programming language that combines memory safety and low-level control, providing C-like performance.
Existing work falls into two categories: rule-based and large language model (LLM)-based.
We present VERT, a tool that can produce readable Rust transpilations with formal guarantees of correctness.
arXiv Detail & Related papers (2024-04-29T16:45:03Z) - FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [51.898805184427545]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.<n>We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language.<n>We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z) - CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs.
Our studies reveal a new and universal safety vulnerability of these models against code input.
We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.