Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust
- URL: http://arxiv.org/abs/2310.10298v3
- Date: Sun, 26 May 2024 11:15:28 GMT
- Title: Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust
- Authors: Jie Zhou, Mingshen Sun, John Criswell,
- Abstract summary: Rust is one of the most promising systems programming languages to solve the memory safety issues that have plagued low-level software for over forty years.
unsafe Rust code and directly-linked unsafe foreign libraries may not only introduce memory safety violations themselves but also compromise the entire program as they run in the same monolithic address space as the safe Rust.
We have prototyped a whole-program analysis for identifying both unsafe heap allocations and memory accesses to those unsafe heap objects.
- Score: 23.0568924498396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rust is one of the most promising systems programming languages to fundamentally solve the memory safety issues that have plagued low-level software for over forty years. However, to accommodate the scenarios where Rust's type rules might be too restrictive for certain systems programming and where programmers opt for performance over security checks, Rust opens security escape hatches allowing writing unsafe source code or calling unsafe libraries. Consequently, unsafe Rust code and directly-linked unsafe foreign libraries may not only introduce memory safety violations themselves but also compromise the entire program as they run in the same monolithic address space as the safe Rust. This problem can be mitigated by isolating unsafe memory objects (those accessed by unsafe code) and sandboxing memory accesses to the unsafe memory. One category of prior work utilizes existing program analysis frameworks on LLVM IR to identify unsafe memory objects and accesses. However, they suffer the limitations of prolonged analysis time and low precision. In this paper, we tackled these two challenges using summary-based whole-program analysis on Rust's MIR. The summary-based analysis computes information on demand so as to save analysis time. Performing analysis on Rust's MIR exploits the rich high-level type information inherent to Rust, which is unavailable in LLVM IR. This manuscript is a preliminary study of ongoing research. We have prototyped a whole-program analysis for identifying both unsafe heap allocations and memory accesses to those unsafe heap objects. We reported the overhead and the efficacy of the analysis in this paper.
Related papers
- KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks.
In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel.
To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z) - Characterizing Unsafe Code Encapsulation In Real-world Rust Systems [2.285834282327349]
Interior unsafe is an essential design paradigm advocated by the Rust community in system software development.
The Rust compiler is incapable of verifying the soundness of a safe function containing unsafe code.
We propose a novel unsafety isolation graph to model the essential usage and encapsulation of unsafe code.
arXiv Detail & Related papers (2024-06-12T06:59:51Z) - Bringing Rust to Safety-Critical Systems in Space [1.0742675209112622]
Rust aims to drastically reduce the chance of introducing bugs and produces overall more secure and safer code.
This work provides a set of recommendations for the development of safety-critical space systems in Rust.
arXiv Detail & Related papers (2024-05-28T12:48:47Z) - FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.
FoC-BinLLM outperforms ChatGPT by 14.61% on the ROUGE-L score.
FoC-Sim outperforms the previous best methods with a 52% higher Recall@1.
arXiv Detail & Related papers (2024-03-27T09:45:33Z) - Analysis of the Memorization and Generalization Capabilities of AI
Agents: Are Continual Learners Robust? [91.682459306359]
In continual learning (CL), an AI agent learns from non-stationary data streams under dynamic environments.
In this paper, a novel CL framework is proposed to achieve robust generalization to dynamic environments while retaining past knowledge.
The generalization and memorization performance of the proposed framework are theoretically analyzed.
arXiv Detail & Related papers (2023-09-18T21:00:01Z) - rCanary: Detecting Memory Leaks Across Semi-automated Memory Management Boundary in Rust [4.616001680122352]
Rust is a system programming language that guarantees memory safety via compile-time verifications.
We present rCanary, a static, non-automated, and fully automated model checker to detect leaks across semiautomated boundary.
arXiv Detail & Related papers (2023-08-09T08:26:04Z) - Is unsafe an Achilles' Heel? A Comprehensive Study of Safety
Requirements in Unsafe Rust Programming [4.981203415693332]
Rust is an emerging, strongly-typed programming language focusing on efficiency and memory safety.
Current unsafe API documents in the standard library exhibited variations, including inconsistency and insufficiency.
To enhance Rust security, we suggest unsafe API documents to list systematic descriptions of safety requirements for users to follow.
arXiv Detail & Related papers (2023-08-09T08:16:10Z) - Safe Deep Reinforcement Learning by Verifying Task-Level Properties [84.64203221849648]
Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL)
The cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space.
In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric.
arXiv Detail & Related papers (2023-02-20T15:24:06Z) - Unsafe's Betrayal: Abusing Unsafe Rust in Binary Reverse Engineering
toward Finding Memory-safety Bugs via Machine Learning [20.68333298047064]
Rust provides memory-safe mechanisms to avoid memory-safety bugs in programming.
Unsafe code that enhances the usability of Rust provides clear spots for finding memory-safety bugs.
We claim that these unsafe spots can still be identifiable in Rust binary code via machine learning.
arXiv Detail & Related papers (2022-10-31T19:32:18Z) - A first-order logic characterization of safety and co-safety languages [63.29821624186913]
Safety and co-safety languages, where a finite prefix suffices to establish whether a word does not belong or belongs to a language, play a crucial role in lowering the complexity of problems like model checking and reactive synthesis.
This paper introduces a fragment of FO-TLO, called SafetyFO, and of its dual coSafety, which are expressively complete with respect to the safety and co-safety languages.
arXiv Detail & Related papers (2022-09-06T09:00:38Z) - Safe Reinforcement Learning with Linear Function Approximation [48.75026009895308]
We introduce safety as an unknown linear cost function of states and actions, which must always fall below a certain threshold.
We then present algorithms, termed SLUCB-QVI and RSLUCB-QVI, for episodic Markov decision processes (MDPs) with linear function approximation.
We show that SLUCB-QVI and RSLUCB-QVI, while with emphno safety violation, achieve a $tildemathcalOleft(kappasqrtd3H3Tright)$ regret, nearly matching
arXiv Detail & Related papers (2021-06-11T08:46:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.