TME-Box: Scalable In-Process Isolation through Intel TME-MK Memory Encryption
- URL: http://arxiv.org/abs/2407.10740v2
- Date: Mon, 4 Nov 2024 12:14:07 GMT
- Title: TME-Box: Scalable In-Process Isolation through Intel TME-MK Memory Encryption
- Authors: Martin Unterguggenberger, Lukas Lamster, David Schrammel, Martin Schwarzl, Stefan Mangard,
- Abstract summary: Cloud computing relies on in-process isolation to optimize performance by running workloads within a single process.
Existing in-process isolation mechanisms are not suitable for modern cloud requirements.
This paper presents TME-Box, a novel isolation technique that enables fine-grained and scalable sandboxing on commodity x86 machines.
- Score: 11.543384661361232
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficient cloud computing relies on in-process isolation to optimize performance by running workloads within a single process. Without heavy-weight process isolation, memory safety errors pose a significant security threat by allowing an adversary to extract or corrupt the private data of other co-located tenants. Existing in-process isolation mechanisms are not suitable for modern cloud requirements, e.g., MPK's 16 protection domains are insufficient to isolate thousands of cloud workers per process. Consequently, cloud service providers have a strong need for lightweight in-process isolation on commodity x86 machines. This paper presents TME-Box, a novel isolation technique that enables fine-grained and scalable sandboxing on commodity x86 CPUs. By repurposing Intel TME-MK, which is intended for the encryption of virtual machines, TME-Box offers lightweight and efficient in-process isolation. TME-Box enforces that sandboxes use their designated encryption keys for memory interactions through compiler instrumentation. This cryptographic isolation enables fine-grained access control, from single cache lines to full pages, and supports flexible data relocation. In addition, the design of TME-Box allows the efficient isolation of up to 32K concurrent sandboxes. We present a performance-optimized TME-Box prototype, utilizing x86 segment-based addressing, that showcases geomean performance overheads of 5.2 % for data isolation and 9.7 % for code and data isolation, evaluated with the SPEC CPU2017 benchmark suite.
Related papers
- Rethinking the confidential cloud through a unified low-level abstraction for composable isolation [1.1595071545168434]
We introduce a unified isolation model that delegates enforceable, composable, and attestable isolation to a single trusted security monitor: Tyche.<n>Tyche provides an API for partitioning, sharing, attesting, and reclaiming resources through its core abstraction, trust domains (TDs)<n>We provide an SDK to run and compose unmodified workloads as sandboxes, enclaves, and CVMs with minimal overhead compared to native Linux execution.
arXiv Detail & Related papers (2025-07-16T16:08:24Z) - NanoZone: Scalable, Efficient, and Secure Memory Protection for Arm CCA [4.597444093276292]
Arm Confidential Computing Architecture (CCA) currently isolates at the granularity of an entire Confidential Virtual Machine (CVM)<n>We extend CCA with a three-tier zone model that spawns an unlimited number of lightweight isolation domains inside a single process.<n>To block domain-switch abuse, we also add a fast user-level Code-Pointer Integrity (CPI) mechanism.
arXiv Detail & Related papers (2025-06-08T07:55:48Z) - PC-MoE: Memory-Efficient and Privacy-Preserving Collaborative Training for Mixture-of-Experts LLMs [56.04036826558497]
We introduce Privacy-preserving Collaborative Mixture-of-Experts (PC-MoE)<n>By design, PC-MoE synergistically combines the strengths of distributed computation with strong confidentiality assurances.<n>It almost matches (and sometimes exceeds) the performance and convergence rate of a fully centralized model, enjoys near 70% peak GPU RAM reduction, while being fully robust against reconstruction attacks.
arXiv Detail & Related papers (2025-06-03T15:00:18Z) - Efficient Token Compression for Vision Transformer with Spatial Information Preserved [59.79302182800274]
Token compression is essential for reducing the computational and memory requirements of transformer models.
We propose an efficient and hardware-compatible token compression method called Prune and Merge.
arXiv Detail & Related papers (2025-03-30T14:23:18Z) - Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs [74.74225314708225]
Multi-head Latent Attention (MLA) is an innovative architecture designed to ensure efficient and economical inference.
This paper proposes the first data-efficient fine-tuning method for transitioning from Multi-Head Attention to MLA.
arXiv Detail & Related papers (2025-02-20T18:50:42Z) - TEEMATE: Fast and Efficient Confidential Container using Shared Enclave [17.032423912089854]
We introduce TeeMate, a new approach to utilize the enclaves on the host system.
We show that TeeMate achieves at least 4.5 times lower latency and 2.8 times lower memory usage compared to the applications built on the conventional confidential containers.
arXiv Detail & Related papers (2024-11-18T09:50:20Z) - LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy [59.1298692559785]
Key-Value ( KV) cache is crucial component in serving transformer-based autoregressive large language models (LLMs)
Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages; (2) KV cache compression at test time; and (3) KV cache compression at test time.
We propose a low-rank approximation of KV weight matrices, allowing plug-in integration with existing transformer-based LLMs without model retraining.
Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages.
arXiv Detail & Related papers (2024-10-04T03:10:53Z) - BULKHEAD: Secure, Scalable, and Efficient Kernel Compartmentalization with PKS [16.239598954752594]
Kernel compartmentalization is a promising approach that follows the least-privilege principle.
We present BULKHEAD, a secure, scalable, and efficient kernel compartmentalization technique.
We implement a prototype system on Linux v6.1 to compartmentalize loadable kernel modules.
arXiv Detail & Related papers (2024-09-15T04:11:26Z) - KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization [67.74400574357472]
LLMs are seeing growing use for applications which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference.
Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurately in sub-4-bit precision.
Our work, KVQuant, facilitates low precision KV cache quantization by incorporating several novel methods.
arXiv Detail & Related papers (2024-01-31T18:58:14Z) - HasTEE+ : Confidential Cloud Computing and Analytics with Haskell [50.994023665559496]
Confidential computing enables the protection of confidential code and data in a co-tenanted cloud deployment using specialized hardware isolation units called Trusted Execution Environments (TEEs)
TEEs offer low-level C/C++-based toolchains that are susceptible to inherent memory safety vulnerabilities and lack language constructs to monitor explicit and implicit information-flow leaks.
We address the above with HasTEE+, a domain-specific language (cla) embedded in Haskell that enables programming TEEs in a high-level language with strong type-safety.
arXiv Detail & Related papers (2024-01-17T00:56:23Z) - EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM [71.868623296582]
EdgeSAM is an accelerated variant of the Segment Anything Model (SAM)
Our approach involves distilling the original ViT-based SAM image encoder into a purely CNN-based architecture.
It is the first SAM variant that can run at over 30 FPS on an iPhone 14.
arXiv Detail & Related papers (2023-12-11T18:59:52Z) - Managing Large Enclaves in a Data Center [2.708829957859632]
We propose a new technique, OptMig, to implement secure enclave migration with a near-zero downtime.
Our optimizations reduce the total downtime by 77-96% for a suite of Intel SGX applications that have multi-GB memory footprints.
arXiv Detail & Related papers (2023-11-13T00:08:37Z) - Putting a Padlock on Lambda -- Integrating vTPMs into AWS Firecracker [49.1574468325115]
Software services place implicit trust in the cloud provider, without an explicit trust relationship.
There is currently no cloud provider that exposes Trusted Platform Module capabilities.
We improve trust by integrating a virtual TPM device into the Firecracker, originally developed by Amazon Web Services.
arXiv Detail & Related papers (2023-10-05T13:13:55Z) - Capacity: Cryptographically-Enforced In-Process Capabilities for Modern ARM Architectures (Extended Version) [1.2687030176231846]
Capacity is a novel hardware-assisted intra-process access control design that embraces capability-based security principles.
With intra-process domains authenticated with unique PA keys, Capacity transforms file descriptors and memory pointers into cryptographically-authenticated references.
We evaluate our Capacity-enabled NGINX web server prototype and other common applications in which sensitive resources are isolated into different domains.
arXiv Detail & Related papers (2023-09-20T08:57:02Z) - Citadel: Real-World Hardware-Software Contracts for Secure Enclaves Through Microarchitectural Isolation and Controlled Speculation [8.414722884952525]
Hardware isolation primitives such as secure enclaves aim to protect programs, but remain vulnerable to transient execution attacks.
This paper advocates for processors to incorporate microarchitectural isolation primitives and mechanisms for controlled speculation.
We introduce two mechanisms to securely share memory between an enclave and an untrusted OS in an out-of-order processor.
arXiv Detail & Related papers (2023-06-26T17:51:23Z) - Leveraging Automated Mixed-Low-Precision Quantization for tiny edge
microcontrollers [76.30674794049293]
This paper presents an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices.
Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors.
Given an MCU-class memory bound to 2MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions.
arXiv Detail & Related papers (2020-08-12T06:09:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.