Mon CHÉRI: Mitigating Uninitialized Memory Access with Conditional Capabilities
- URL: http://arxiv.org/abs/2407.08663v2
- Date: Fri, 18 Apr 2025 18:07:51 GMT
- Title: Mon CHÉRI: Mitigating Uninitialized Memory Access with Conditional Capabilities
- Authors: Merve Gülmez, Håkan Englund, Jan Tobias Mühlberg, Thomas Nyman,
- Abstract summary: Up to 10% of memory-safety vulnerabilities in languages like C and C++ stem from und variables.<n>This work addresses the prevalence and lack of adequate software mitigations for und memory issues.<n>We extend the CHERI capability model to include "conditional capabilities", enabling memory-access policies based on prior operations.
- Score: 1.9042151977387252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Up to 10% of memory-safety vulnerabilities in languages like C and C++ stem from uninitialized variables. This work addresses the prevalence and lack of adequate software mitigations for uninitialized memory issues, proposing architectural protections in hardware. Capability-based addressing, such as the University of Cambridge's CHERI, mitigates many memory defects, including spatial and temporal safety violations at an architectural level. CHERI, however, does not handle undefined behavior from uninitialized variables. We extend the CHERI capability model to include "conditional capabilities", enabling memory-access policies based on prior operations. This allows enforcement of policies that satisfy memory-safety objectives such as "no reads to memory without at least one prior write" (Write-before-Read). We present our architecture extension, compiler support, and detailed evaluation of our approach on the QEMU full-system simulator and a modified FPGA-based CHERI-RISCV softcore. Our evaluation shows conditional capabilities are practical, with high detection accuracy while adding a small (~3.5%) overhead which is comparable to the cost of baseline CHERI capabilities.
Related papers
- Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions [55.19217798774033]
Memory is a fundamental component of AI systems, underpinning large language models (LLMs) based agents.
We introduce six fundamental memory operations: Consolidation, Updating, Indexing, Forgetting, Retrieval, and Compression.
This survey provides a structured and dynamic perspective on research, benchmark datasets, and tools related to memory in AI.
arXiv Detail & Related papers (2025-05-01T17:31:33Z) - CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation [63.23120252801889]
CRUST-Bench is a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases.
We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem.
The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting.
arXiv Detail & Related papers (2025-04-21T17:33:33Z) - BLACKOUT: Data-Oblivious Computation with Blinded Capabilities [10.020700343839248]
We address memory-safety and side-channel resistance by augmenting memory-safe hardware with the ability for data-oblivious programming.
We present BLACKOUT, our realization of blinded capabilities on a FPGA softcore based on the speculative out-of-order CHERI-Toooba processor.
arXiv Detail & Related papers (2025-04-20T15:25:59Z) - ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.
This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z) - Cost-Efficient Continual Learning with Sufficient Exemplar Memory [55.77835198580209]
Continual learning (CL) research typically assumes highly constrained exemplar memory resources.
In this work, we investigate CL in a novel setting where exemplar memory is ample.
Our method achieves state-of-the-art performance while reducing the computational cost to a quarter or third of existing methods.
arXiv Detail & Related papers (2025-02-11T05:40:52Z) - Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy [53.54777131440989]
Large Language Models (LLMs) are susceptible to security and safety threats.
One major cause of these vulnerabilities is the lack of an instruction hierarchy.
We introduce the instructional segment Embedding (ISE) technique, inspired by BERT, to modern large language models.
arXiv Detail & Related papers (2024-10-09T12:52:41Z) - B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module.
B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z) - Secure Rewind and Discard on ARM Morello [0.0]
Memory-unsafe programming languages such as C and C++ are the preferred languages for systems programming, embedded systems, and performance-critical applications.
An earlier approach proposes the Secure Domain Rewind and Discard (SDRaD) of isolated domains as a method to enhance the resilience of software targeted by runtime attacks on x86 architecture.
SDRaD has been adapted to work with the Capability Hardware Enhanced RISC Instructions (CHERI) architecture to be more lightweight and performant.
arXiv Detail & Related papers (2024-07-05T13:41:59Z) - Bringing Rust to Safety-Critical Systems in Space [1.0742675209112622]
Rust aims to drastically reduce the chance of introducing bugs and produces overall more secure and safer code.
This work provides a set of recommendations for the development of safety-critical space systems in Rust.
arXiv Detail & Related papers (2024-05-28T12:48:47Z) - MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module.
Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular.
We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - Enabling Memory Safety of C Programs using LLMs [5.297072277460838]
Memory safety violations in low-level code, written in languages like C, continue to remain one of the major sources of software vulnerabilities.
One method of removing such violations by construction is to port C code to a safe C dialect.
Such dialects rely on programmer-supplied annotations to guarantee safety with minimal runtime overhead.
This porting is a manual process that imposes significant burden on the programmer and hence, there has been limited adoption of this technique.
arXiv Detail & Related papers (2024-04-01T13:05:54Z) - SeMalloc: Semantics-Informed Memory Allocator [18.04397502953383]
Use-after-free (UAF) is a critical and prevalent problem in memory unsafe languages.
We show one way to balance the trinity by passing more semantics about the heap object to the allocator.
In SeMalloc, only heap objects allocated from the same call site and via the same function call stack can possibly share a virtual memory address.
arXiv Detail & Related papers (2024-02-02T21:02:15Z) - Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust [23.0568924498396]
Rust is one of the most promising systems programming languages to solve the memory safety issues that have plagued low-level software for over forty years.
unsafe Rust code and directly-linked unsafe foreign libraries may not only introduce memory safety violations themselves but also compromise the entire program as they run in the same monolithic address space as the safe Rust.
We have prototyped a whole-program analysis for identifying both unsafe heap allocations and memory accesses to those unsafe heap objects.
arXiv Detail & Related papers (2023-10-16T11:34:21Z) - Analysis of the Memorization and Generalization Capabilities of AI
Agents: Are Continual Learners Robust? [91.682459306359]
In continual learning (CL), an AI agent learns from non-stationary data streams under dynamic environments.
In this paper, a novel CL framework is proposed to achieve robust generalization to dynamic environments while retaining past knowledge.
The generalization and memorization performance of the proposed framework are theoretically analyzed.
arXiv Detail & Related papers (2023-09-18T21:00:01Z) - Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size.
We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos.
We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z) - CryptSan: Leveraging ARM Pointer Authentication for Memory Safety in
C/C++ [0.9208007322096532]
CryptSan is a memory safety approach based on ARM Pointer Authentication.
We present a full LLVM-based prototype implementation, running on an M1 MacBook Pro.
This, together with its interoperability with uninstrumented libraries and cryptographic protection against attacks on metadata, makes CryptSan a viable solution for retrofitting memory safety to C/C++ programs.
arXiv Detail & Related papers (2022-02-17T14:04:01Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.