Related papers: Towards a Transpiler for C/C++ to Safer Rust

Towards a Transpiler for C/C++ to Safer Rust

URL: http://arxiv.org/abs/2401.08264v1
Date: Tue, 16 Jan 2024 10:35:59 GMT
Title: Towards a Transpiler for C/C++ to Safer Rust
Authors: Dhiren Tripuramallu, Swapnil Singh, Shrirang Deshmukh, Srinivas Pinisetty, Shinde Arjun Shivaji, Raja Balusamy, Ajaganna Bandeppa
Abstract summary: Rust is a programming language developed by Mozilla that focuses on performance and safety. How to convert an existing C++ code base to Rust is also gaining greater attention.
Score: 0.10993800728351737
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Rust is a multi-paradigm programming language developed by Mozilla that focuses on performance and safety. Rust code is arguably known best for its speed and memory safety, a property essential while developing embedded systems. Thus, it becomes one of the alternatives when developing operating systems for embedded devices. How to convert an existing C++ code base to Rust is also gaining greater attention. In this work, we focus on the process of transpiling C++ code to a Rust codebase in a robust and safe manner. The manual transpilation process is carried out to understand the different constructs of the Rust language and how they correspond to C++ constructs. Based on the learning from the manual transpilation, a transpilation table is created to aid in future transpilation efforts and to develop an automated transpiler. We also studied the existing automated transpilers and identified the problems and inefficiencies they involved. The results of the transpilation process were closely monitored and evaluated, showing improved memory safety without compromising performance and reliability of the resulting codebase. The study concludes with a comprehensive analysis of the findings, an evaluation of the implications for future research, and recommendations for the same in this area.

Related papers

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation [63.23120252801889]
CRUST-Bench is a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases. We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem. The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting.
arXiv Detail & Related papers (2025-04-21T17:33:33Z)
Beyond the Edge of Function: Unraveling the Patterns of Type Recovery in Binary Code [55.493408628371235]
We propose ByteTR, a framework for recovering variable types in binary code. In light of the ubiquity of variable propagation across functions, ByteTR conducts inter-procedural analysis to trace variable propagation and employs a gated graph neural network to capture long-range data flow dependencies for variable type recovery.
arXiv Detail & Related papers (2025-03-10T12:27:05Z)
ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages. This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z)
Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis [8.361424157571468]
Syzygy is an automated approach to translate C to safe Rust. This is the largest automated and test-validated C to safe Rust code translation achieved so far.
arXiv Detail & Related papers (2024-12-18T18:55:46Z)
Translating C To Rust: Lessons from a User Study [12.49361031696427]
Rust aims to offer full memory safety for programs, a guarantee that untamed C programs do not enjoy. We report on a user study asking humans to translate real-world C programs to Rust. Our participants are able to produce safe Rust translations, whereas state-of-the-art automatic tools are not able to do so.
arXiv Detail & Related papers (2024-11-21T14:37:05Z)
Bringing Rust to Safety-Critical Systems in Space [1.0742675209112622]
Rust aims to drastically reduce the chance of introducing bugs and produces overall more secure and safer code. This work provides a set of recommendations for the development of safety-critical space systems in Rust.
arXiv Detail & Related papers (2024-05-28T12:48:47Z)
VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners [6.824327908701066]
Rust is a programming language that combines memory safety and low-level control, providing C-like performance. Existing work falls into two categories: rule-based and large language model (LLM)-based. We present VERT, a tool that can produce readable Rust transpilations with formal guarantees of correctness.
arXiv Detail & Related papers (2024-04-29T16:45:03Z)
FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries. We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language. We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z)
Developing a High-Performance Process Mining Library with Java and Python Bindings in Rust [0.19036571490366497]
Rust emerged as a highly performant, compiled programming language with inherent memory safety. By facilitating interoperability, our methodology enables researchers or industry to develop novel algorithms in Rust once and make them accessible to the entire community.
arXiv Detail & Related papers (2024-01-25T12:59:13Z)
LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch. We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z)
CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code [79.87518649544405]
We present a control flow graph and pseudo code guided binary code summarization framework called CP-BCS. CP-BCS utilizes a bidirectional instruction-level control flow graph and pseudo code that incorporates expert knowledge to learn the comprehensive binary function execution behavior and logic semantics.
arXiv Detail & Related papers (2023-10-24T14:20:39Z)
Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts. Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness. Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z)
Fixing Rust Compilation Errors using LLMs [2.1781086368581932]
The Rust programming language has established itself as a viable choice for low-level systems programming language over the traditional, unsafe alternatives like C/C++. This paper presents a tool called RustAssistant that leverages the emergent capabilities of Large Language Models (LLMs) to automatically suggest fixes for Rust compilation errors. RustAssistant is able to achieve an impressive peak accuracy of roughly 74% on real-world compilation errors in popular open-source Rust repositories.
arXiv Detail & Related papers (2023-08-09T18:30:27Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.