His2Trans: A Skeleton First Framework for Self Evolving C to Rust Translation with Historical Retrieval
- URL: http://arxiv.org/abs/2603.02617v1
- Date: Tue, 03 Mar 2026 05:42:08 GMT
- Title: His2Trans: A Skeleton First Framework for Self Evolving C to Rust Translation with Historical Retrieval
- Authors: Shengbo Wang, Mingwei Liu, Guangsheng Ou, Yuwen Chen, Zike Li, Yanlin Wang, Zibin Zheng,
- Abstract summary: His2Trans is a framework that combines a deterministic, build-aware skeleton with self-evolving knowledge extraction to support stable, incremental migration.<n> Experiments on industrial OpenHarmony modules show that His2Trans reaches a 99.75% incremental compilation pass rate.
- Score: 45.246293154277886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated C-to-Rust migration encounters systemic obstacles when scaling from code snippets to industrial projects, mainly because build context is often unavailable ("dependency hell") and domain-specific evolutionary knowledge is missing. As a result, current LLM-based methods frequently cannot reconstruct precise type definitions under complex build systems or infer idiomatic API correspondences, which in turn leads to hallucinated dependencies and unproductive repair loops. To tackle these issues, we introduce His2Trans, a framework that combines a deterministic, build-aware skeleton with self-evolving knowledge extraction to support stable, incremental migration. On the structural side, His2Trans performs build tracing to create a compilable Project-Level Skeleton Graph, providing a strictly typed environment that separates global verification from local logic generation. On the cognitive side, it derives fine-grained API and code-fragment rules from historical migration traces and uses a Retrieval-Augmented Generation (RAG) system to steer the LLM toward idiomatic interface reuse. Experiments on industrial OpenHarmony modules show that His2Trans reaches a 99.75% incremental compilation pass rate, effectively fixing build failures where baselines struggle. On general-purpose benchmarks, it lowers the unsafe code ratio by 23.6 percentage points compared to C2Rust while producing the fewest warnings. Finally, knowledge accumulation studies demonstrate the framework's evolutionary behavior: by continuously integrating verified patterns, His2Trans cuts repair overhead on unseen tasks by about 60%.
Related papers
- Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs [50.075587392477935]
We conduct the first large-scale empirical study of 705 real-world failures from the open-source DeepSeek, Llama, and Qwen ecosystems.<n>Our analysis reveals a paradigm shift: white-box orchestration relocates the reliability bottleneck from model algorithmic defects to the systemic fragility of the deployment stack.
arXiv Detail & Related papers (2026-01-20T06:42:56Z) - LibContinual: A Comprehensive Library towards Realistic Continual Learning [62.34449396069085]
A fundamental challenge in Continual Learning (CL) is catastrophic forgetting, where adapting to new tasks degrades the performance on previous ones.<n>We propose LibContinual, a comprehensive and reproducible library designed to serve as a foundational platform for realistic CL.
arXiv Detail & Related papers (2025-12-26T13:59:13Z) - Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding [37.78627994991325]
CoCo is a novel framework that enables code Completion by of multi-granularity context from large-scale code repositories.<n>Experiments on CrossCodeEval and RepoEval benchmarks demonstrate that CoCo consistently surpasses state-of-the-art baselines.
arXiv Detail & Related papers (2025-12-04T07:37:59Z) - Context-Guided Decompilation: A Step Towards Re-executability [50.71992919223209]
Binary decompilation plays an important role in software security analysis, reverse engineering and malware understanding.<n>Recent advances in large language models (LLMs) have enabled neural decompilation, but the generated code is typically only semantically plausible.<n>We propose ICL4Decomp, a hybrid decompilation framework that leverages in-context learning (ICL) to guide LLMs toward generating re-executable source code.
arXiv Detail & Related papers (2025-11-03T17:21:39Z) - SK2Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin [17.843213826367343]
We introduce SK2Decompile, a novel two-phase approach to decompile from the skeleton (semantic structure) to the skin (identifier) of programs.<n>Specifically, we first apply a Structure Recovery model to translate a program's binary code to an Intermediate Representation (IR) as deriving the program's "skeleton"<n>We apply reinforcement learning to reward the model for producing program structures that adhere to the syntactic and semantic rules expected by compilers.
arXiv Detail & Related papers (2025-09-26T09:35:46Z) - Integrating Rules and Semantics for LLM-Based C-to-Rust Translation [34.61632926526051]
We propose IRENE, an LLM-based framework that integrates RulEs aNd sEmantics to enhance translation.<n> IRENE consists of three modules: 1) a rule-augmented retrieval module that selects relevant translation examples based on rules generated from a static analyzer developed by us, thereby improving the handling of Rust rules; 2) a structured summarization module that produces a structured summary for guiding LLMs to enhance the semantic understanding of C code; 3) an error-driven translation module that leverages compiler diagnostics to iteratively refine translations.
arXiv Detail & Related papers (2025-08-09T10:41:03Z) - EvoC2Rust: A Skeleton-guided Framework for Project-Level C-to-Rust Translation [17.560908544319094]
EvoC2Rust is an automated framework for converting complete C projects to equivalent Rust ones.<n>It employs a skeleton-guided translation strategy for project-level translation.
arXiv Detail & Related papers (2025-08-06T10:31:23Z) - Large Language Model-Powered Agent for C to Rust Code Translation [2.182572303351317]
A modern system programming language, Rust, has emerged as a memory-safe alternative to the C programming language.<n>Applying the agentic capability for the C-to-Rust translation introduces distinct challenges.<n>Unlike math or commonsense QA, the intermediate steps required for C-to-Rust are not well-defined.<n>We propose a novel intermediate step, the Virtual Fuzzing-based equivalence Test (VFT), and an agentic planning framework, the LLM-powered Agent for C-to-Rust code translation (LAC2R)
arXiv Detail & Related papers (2025-05-21T01:26:23Z) - ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z) - MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation [60.04380907045708]
Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem.<n>We propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval.<n>MemoRAG achieves superior performances across a variety of long-context evaluation tasks.
arXiv Detail & Related papers (2024-09-09T13:20:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.