Related papers: Unsafe's Betrayal: Abusing Unsafe Rust in Binary Reverse Engineering toward Finding Memory-safety Bugs via Machine Learning

Unsafe's Betrayal: Abusing Unsafe Rust in Binary Reverse Engineering toward Finding Memory-safety Bugs via Machine Learning

URL: http://arxiv.org/abs/2211.00111v1
Date: Mon, 31 Oct 2022 19:32:18 GMT
Title: Unsafe's Betrayal: Abusing Unsafe Rust in Binary Reverse Engineering toward Finding Memory-safety Bugs via Machine Learning
Authors: Sangdon Park and Xiang Cheng and Taesoo Kim
Abstract summary: Rust provides memory-safe mechanisms to avoid memory-safety bugs in programming. Unsafe code that enhances the usability of Rust provides clear spots for finding memory-safety bugs. We claim that these unsafe spots can still be identifiable in Rust binary code via machine learning.
Score: 20.68333298047064
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Memory-safety bugs introduce critical software-security issues. Rust provides memory-safe mechanisms to avoid memory-safety bugs in programming, while still allowing unsafe escape hatches via unsafe code. However, the unsafe code that enhances the usability of Rust provides clear spots for finding memory-safety bugs in Rust source code. In this paper, we claim that these unsafe spots can still be identifiable in Rust binary code via machine learning and be leveraged for finding memory-safety bugs. To support our claim, we propose the tool textttrustspot, that enables reverse engineering to learn an unsafe classifier that proposes a list of functions in Rust binaries for downstream analysis. We empirically show that the function proposals by textttrustspot can recall $92.92\%$ of memory-safety bugs, while it covers only $16.79\%$ of the entire binary code. As an application, we demonstrate that the function proposals are used in targeted fuzzing on Rust packages, which contribute to reducing the fuzzing time compared to non-targeted fuzzing.

Related papers

Securing Mixed Rust with Hardware Capabilities [12.52089113918087]
CapsLock is a security enforcement mechanism that can run at the level of machine code and detect Rust principle violations at run-time in mixed code.<n> CapsLock is kept simple enough to be implemented into recent capability-based hardware abstractions.
arXiv Detail & Related papers (2025-07-04T07:12:43Z)
Targeted Fuzzing for Unsafe Rust Code: Leveraging Selective Instrumentation [3.6968220664227633]
Rust is a promising programming language that focuses on usability and security.<n>It allows programmers to write unsafe code which is not subject to the strict Rust security policy.<n>We present an automated approach to detect unsafe and safe code components to decide which parts of the program a fuzzer should focus on.
arXiv Detail & Related papers (2025-05-05T08:48:42Z)
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation [63.23120252801889]
CRUST-Bench is a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases. We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem. The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting.
arXiv Detail & Related papers (2025-04-21T17:33:33Z)
ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages. This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z)
C2SaferRust: Transforming C Projects into Safer Rust with NeuroSymbolic Techniques [16.111888466557144]
We present C2SaferRust, a novel approach to translate C to Rust. We first use C2Rust to convert C code to non-idiomatic, unsafe Rust. We decompose the unsafe Rust code into slices that can be individually translated to safer Rust by an LLM. After processing each slice, we run end-to-end test cases to verify that the code still functions as expected.
arXiv Detail & Related papers (2025-01-24T05:53:07Z)
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models [57.5404308854535]
Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions. We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relatively uniform drifts in the model's embedding space. Our bi-level optimization method identifies universal embedding perturbations that elicit unwanted behaviors and adjusts the model parameters to reinforce safe behaviors against these perturbations.
arXiv Detail & Related papers (2024-06-24T19:29:47Z)
Characterizing Unsafe Code Encapsulation In Real-world Rust Systems [2.285834282327349]
Interior unsafe is an essential design paradigm advocated by the Rust community in system software development. The Rust compiler is incapable of verifying the soundness of a safe function containing unsafe code. We propose a novel unsafety isolation graph to model the essential usage and encapsulation of unsafe code.
arXiv Detail & Related papers (2024-06-12T06:59:51Z)
VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners [6.824327908701066]
Rust is a programming language that combines memory safety and low-level control, providing C-like performance. Existing work falls into two categories: rule-based and large language model (LLM)-based. We present VERT, a tool that can produce readable Rust transpilations with formal guarantees of correctness.
arXiv Detail & Related papers (2024-04-29T16:45:03Z)
FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries. We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language. We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z)
Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust [23.0568924498396]
Rust is one of the most promising systems programming languages to solve the memory safety issues that have plagued low-level software for over forty years. unsafe Rust code and directly-linked unsafe foreign libraries may not only introduce memory safety violations themselves but also compromise the entire program as they run in the same monolithic address space as the safe Rust. We have prototyped a whole-program analysis for identifying both unsafe heap allocations and memory accesses to those unsafe heap objects.
arXiv Detail & Related papers (2023-10-16T11:34:21Z)
rCanary: Detecting Memory Leaks Across Semi-automated Memory Management Boundary in Rust [4.616001680122352]
Rust is a system programming language that guarantees memory safety via compile-time verifications. We present rCanary, a static, non-automated, and fully automated model checker to detect leaks across semiautomated boundary.
arXiv Detail & Related papers (2023-08-09T08:26:04Z)
Is unsafe an Achilles' Heel? A Comprehensive Study of Safety Requirements in Unsafe Rust Programming [4.981203415693332]
Rust is an emerging, strongly-typed programming language focusing on efficiency and memory safety. Current unsafe API documents in the standard library exhibited variations, including inconsistency and insufficiency. To enhance Rust security, we suggest unsafe API documents to list systematic descriptions of safety requirements for users to follow.
arXiv Detail & Related papers (2023-08-09T08:16:10Z)
A Multiplicative Value Function for Safe and Efficient Reinforcement Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z)
Safe Deep Reinforcement Learning by Verifying Task-Level Properties [84.64203221849648]
Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL) The cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space. In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric.
arXiv Detail & Related papers (2023-02-20T15:24:06Z)
Provable Safe Reinforcement Learning with Binary Feedback [62.257383728544006]
We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs. We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
arXiv Detail & Related papers (2022-10-26T05:37:51Z)
Sample-Efficient Safety Assurances using Conformal Prediction [57.92013073974406]
Early warning systems can provide alerts when an unsafe situation is imminent. To reliably improve safety, these warning systems should have a provable false negative rate. We present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics.
arXiv Detail & Related papers (2021-09-28T23:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.