Related papers: Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust

Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust

URL: http://arxiv.org/abs/2310.10298v3
Date: Sun, 26 May 2024 11:15:28 GMT
Title: Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust
Authors: Jie Zhou, Mingshen Sun, John Criswell,
Abstract summary: Rust is one of the most promising systems programming languages to solve the memory safety issues that have plagued low-level software for over forty years. unsafe Rust code and directly-linked unsafe foreign libraries may not only introduce memory safety violations themselves but also compromise the entire program as they run in the same monolithic address space as the safe Rust. We have prototyped a whole-program analysis for identifying both unsafe heap allocations and memory accesses to those unsafe heap objects.
Score: 23.0568924498396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Rust is one of the most promising systems programming languages to fundamentally solve the memory safety issues that have plagued low-level software for over forty years. However, to accommodate the scenarios where Rust's type rules might be too restrictive for certain systems programming and where programmers opt for performance over security checks, Rust opens security escape hatches allowing writing unsafe source code or calling unsafe libraries. Consequently, unsafe Rust code and directly-linked unsafe foreign libraries may not only introduce memory safety violations themselves but also compromise the entire program as they run in the same monolithic address space as the safe Rust. This problem can be mitigated by isolating unsafe memory objects (those accessed by unsafe code) and sandboxing memory accesses to the unsafe memory. One category of prior work utilizes existing program analysis frameworks on LLVM IR to identify unsafe memory objects and accesses. However, they suffer the limitations of prolonged analysis time and low precision. In this paper, we tackled these two challenges using summary-based whole-program analysis on Rust's MIR. The summary-based analysis computes information on demand so as to save analysis time. Performing analysis on Rust's MIR exploits the rich high-level type information inherent to Rust, which is unavailable in LLVM IR. This manuscript is a preliminary study of ongoing research. We have prototyped a whole-program analysis for identifying both unsafe heap allocations and memory accesses to those unsafe heap objects. We reported the overhead and the efficacy of the analysis in this paper.

Related papers

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation [63.23120252801889]
CRUST-Bench is a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases. We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem. The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting.
arXiv Detail & Related papers (2025-04-21T17:33:33Z)
ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages. This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z)
Fearless Unsafe. A More User-friendly Document for Unsafe Rust Programming Base on Refined Safety Properties [4.250147698839545]
Rust, a popular systems-level programming language, has garnered widespread attention due to its features of achieving run-time efficiency and memory safety. With an increasing number of real-world projects adopting Rust, understanding how to assist programmers in writing unsafe code poses a significant challenge. Based on our observations, the current standard library has many unsafe APIs, but their descriptions are not uniform, complete, and intuitive.
arXiv Detail & Related papers (2024-12-09T07:00:31Z)
KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks. In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel. To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z)
Characterizing Unsafe Code Encapsulation In Real-world Rust Systems [2.285834282327349]
Interior unsafe is an essential design paradigm advocated by the Rust community in system software development. The Rust compiler is incapable of verifying the soundness of a safe function containing unsafe code. We propose a novel unsafety isolation graph to model the essential usage and encapsulation of unsafe code.
arXiv Detail & Related papers (2024-06-12T06:59:51Z)
Bringing Rust to Safety-Critical Systems in Space [1.0742675209112622]
Rust aims to drastically reduce the chance of introducing bugs and produces overall more secure and safer code. This work provides a set of recommendations for the development of safety-critical space systems in Rust.
arXiv Detail & Related papers (2024-05-28T12:48:47Z)
FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries. We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language. We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z)
Analysis of the Memorization and Generalization Capabilities of AI Agents: Are Continual Learners Robust? [91.682459306359]
In continual learning (CL), an AI agent learns from non-stationary data streams under dynamic environments. In this paper, a novel CL framework is proposed to achieve robust generalization to dynamic environments while retaining past knowledge. The generalization and memorization performance of the proposed framework are theoretically analyzed.
arXiv Detail & Related papers (2023-09-18T21:00:01Z)
rCanary: Detecting Memory Leaks Across Semi-automated Memory Management Boundary in Rust [4.616001680122352]
Rust is a system programming language that guarantees memory safety via compile-time verifications. We present rCanary, a static, non-automated, and fully automated model checker to detect leaks across semiautomated boundary.
arXiv Detail & Related papers (2023-08-09T08:26:04Z)
Is unsafe an Achilles' Heel? A Comprehensive Study of Safety Requirements in Unsafe Rust Programming [4.981203415693332]
Rust is an emerging, strongly-typed programming language focusing on efficiency and memory safety. Current unsafe API documents in the standard library exhibited variations, including inconsistency and insufficiency. To enhance Rust security, we suggest unsafe API documents to list systematic descriptions of safety requirements for users to follow.
arXiv Detail & Related papers (2023-08-09T08:16:10Z)
Safe Deep Reinforcement Learning by Verifying Task-Level Properties [84.64203221849648]
Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL) The cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space. In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric.
arXiv Detail & Related papers (2023-02-20T15:24:06Z)
Unsafe's Betrayal: Abusing Unsafe Rust in Binary Reverse Engineering toward Finding Memory-safety Bugs via Machine Learning [20.68333298047064]
Rust provides memory-safe mechanisms to avoid memory-safety bugs in programming. Unsafe code that enhances the usability of Rust provides clear spots for finding memory-safety bugs. We claim that these unsafe spots can still be identifiable in Rust binary code via machine learning.
arXiv Detail & Related papers (2022-10-31T19:32:18Z)
A first-order logic characterization of safety and co-safety languages [63.29821624186913]
Safety and co-safety languages, where a finite prefix suffices to establish whether a word does not belong or belongs to a language, play a crucial role in lowering the complexity of problems like model checking and reactive synthesis. This paper introduces a fragment of FO-TLO, called SafetyFO, and of its dual coSafety, which are expressively complete with respect to the safety and co-safety languages.
arXiv Detail & Related papers (2022-09-06T09:00:38Z)
Safe Reinforcement Learning with Linear Function Approximation [48.75026009895308]
We introduce safety as an unknown linear cost function of states and actions, which must always fall below a certain threshold. We then present algorithms, termed SLUCB-QVI and RSLUCB-QVI, for episodic Markov decision processes (MDPs) with linear function approximation. We show that SLUCB-QVI and RSLUCB-QVI, while with emphno safety violation, achieve a $tildemathcalOleft(kappasqrtd3H3Tright)$ regret, nearly matching
arXiv Detail & Related papers (2021-06-11T08:46:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.