Related papers: Forcrat: Automatic I/O API Translation from C to Rust via Origin and Capability Analysis

Forcrat: Automatic I/O API Translation from C to Rust via Origin and Capability Analysis

URL: http://arxiv.org/abs/2506.01427v1
Date: Mon, 02 Jun 2025 08:34:06 GMT
Title: Forcrat: Automatic I/O API Translation from C to Rust via Origin and Capability Analysis
Authors: Jaemin Hong, Sukyoung Ryu,
Abstract summary: We focus on replacing the I/O API, an important subset of library functions.<n>We propose two static analysis techniques, origin and capability analysis and error source analysis, and use their results to replace the I/O API.<n>Our evaluation shows that the proposed approach is (1) correct, with all 32 programs that have test suites passing the tests after transformation, (2) efficient, analyzing and transforming 422k LOC in 14 seconds, and (3) widely applicable, replacing 82% of I/O API calls.
Score: 10.16257074782054
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Translating C to Rust is a promising way to enhance the reliability of legacy system programs. Although the industry has developed an automatic C-to-Rust translator, C2Rust, its translation remains unsatisfactory. One major reason is that C2Rust retains C standard library (libc) function calls instead of replacing them with functions from the Rust standard library (Rust std). However, little work has been done on replacing library functions in C2Rust-generated code. In this work, we focus on replacing the I/O API, an important subset of library functions. This poses challenges due to the semantically different designs of I/O APIs in libc and Rust std. First, the two APIs offer different sets of types that represent the origins (e.g., standard input, files) and capabilities (e.g., read, write) of streams used for I/O. Second, they use different error-checking mechanisms: libc uses internal indicators, while Rust std uses return values. To address these challenges, we propose two static analysis techniques, origin and capability analysis and error source analysis, and use their results to replace the I/O API. Our evaluation shows that the proposed approach is (1) correct, with all 32 programs that have test suites passing the tests after transformation, (2) efficient, analyzing and transforming 422k LOC in 14 seconds, and (3) widely applicable, replacing 82% of I/O API calls.

Related papers

EVOC2RUST: A Skeleton-guided Framework for Project-Level C-to-Rust Translation [16.12483934561206]
EvoC2Rust is an automated framework for converting entire C projects to equivalent Rust ones.<n>Our evaluation on open-source benchmarks and six industrial projects demonstrates EvoC2Rust's superior performance in project-level C-to-Rust translation.
arXiv Detail & Related papers (2025-08-06T10:31:23Z)
PR2: Peephole Raw Pointer Rewriting with LLMs for Translating C to Safer Rust [9.867844389029509]
We propose a peephole raw pointer rewriting technique that lifts raw pointers in individual functions to appropriate Rust data structures.<n>PR2 successfully eliminates 13.22% of local raw pointers across 28 real-world C projects.<n>On average, PR2 completes the transformation of a project in 5.44 hours, at an average cost of $1.46.
arXiv Detail & Related papers (2025-05-07T23:30:27Z)
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation [63.23120252801889]
CRUST-Bench is a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases.<n>We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem.<n>The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting.
arXiv Detail & Related papers (2025-04-21T17:33:33Z)
RustMap: Towards Project-Scale C-to-Rust Migration via Program Analysis and LLM [13.584956125542396]
Rust offers superior memory safety while maintaining C's high performance.<n>Existing automated translation tools, such as C2Rust, may rely too much on syntactic, template-based translation.<n>This paper introduces a novel dependency-guided and large language model (LLM)-based C-to-Rust translation approach, RustMap.
arXiv Detail & Related papers (2025-03-22T11:57:45Z)
LLM-Driven Multi-step Translation from C to Rust using Static Analysis [27.122409727034192]
Translating software written in legacy languages to modern languages, such as C to Rust, has significant benefits in improving memory safety.<n>We propose SACTOR, an LLM-driven C-to-Rust zero-shot translation tool using a two-step translation methodology.<n>SACTOR produces more natural and Rust-compliant translations compared to existing methods.
arXiv Detail & Related papers (2025-03-16T14:05:26Z)
ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z)
C2SaferRust: Transforming C Projects into Safer Rust with NeuroSymbolic Techniques [16.111888466557144]
We present C2SaferRust, a novel approach to translate C to Rust.<n>We first use C2Rust to convert C code to non-idiomatic, unsafe Rust.<n>We decompose the unsafe Rust code into slices that can be individually translated to safer Rust by an LLM.<n>After processing each slice, we run end-to-end test cases to verify that the code still functions as expected.
arXiv Detail & Related papers (2025-01-24T05:53:07Z)
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition [77.28814034644287]
We propose SVTRv2, a CTC model that beats leading EDTRs in both accuracy and inference speed. SVTRv2 introduces novel upgrades to handle text irregularity and utilize linguistic context. We evaluate SVTRv2 in both standard and recent challenging benchmarks.
arXiv Detail & Related papers (2024-11-24T14:21:35Z)
FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [51.898805184427545]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.<n>We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language.<n>We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z)
Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts. Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness. Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z)
Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents. Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages. We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.