Managing Large Enclaves in a Data Center
- URL: http://arxiv.org/abs/2311.06991v1
- Date: Mon, 13 Nov 2023 00:08:37 GMT
- Title: Managing Large Enclaves in a Data Center
- Authors: Sandeep Kumar, Smruti R. Sarangi,
- Abstract summary: We present OptMig, an end-to-end solution for live migrating large memory footprints in TEE-enabled applications.
Our approach does not require a developer to modify the application; however, we need a short, separate compilation pass and specialized software library support.
- Score: 3.174768030369157
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Live migration of an application or VM is a well-known technique for load balancing, performance optimization, and resource management. To minimize the total downtime during migration, two popular methods -- pre-copy or post-copy -- are used in practice. These methods scale to large VMs and applications since the downtime is independent of the memory footprint of an application. However, in a secure, trusted execution environment (TEE) like Intel's scalable SGX, the state-of-the-art still uses the decade-old stop-and-copy method, where the total downtime is proportional to the application's memory footprint. This is primarily due to the fact that TEEs like Intel SGX do not expose memory and page table accesses to the OS, quite unlike unsecure applications. However, with modern TEE solutions that efficiently support large applications, such as Intel's Scalable SGX and AMD's Epyc, it is high time that TEE migration methods also evolve to enable live migration of large TEE applications with minimal downtime (stop-and-copy cannot be used any more). We present OptMig, an end-to-end solution for live migrating large memory footprints in TEE-enabled applications. Our approach does not require a developer to modify the application; however, we need a short, separate compilation pass and specialized software library support. Our optimizations reduce the total downtime by 98% for a representative microbenchmark that uses 20GB of secure memory and by 90 -- 96% for a suite of Intel SGX applications that have multi-GB memory footprints.
Related papers
- vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z) - TME-Box: Scalable In-Process Isolation through Intel TME-MK Memory Encryption [11.543384661361232]
Cloud computing relies on in-process isolation to optimize performance by running workloads within a single process.
Existing in-process isolation mechanisms are not suitable for modern cloud requirements.
This paper presents TME-Box, a novel isolation technique that enables fine-grained and scalable sandboxing on commodity x86 machines.
arXiv Detail & Related papers (2024-07-15T14:09:00Z) - Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices [9.928745904761358]
Edge intelligence enables resource-demanding Deep Neural Network (DNN) inference without transferring original data.
For privacy-sensitive applications, deploying models in hardware-isolated trusted execution environments (TEEs) becomes essential.
We present a novel approach for advanced model deployment in TrustZone that ensures comprehensive privacy preservation during model inference.
arXiv Detail & Related papers (2024-03-19T09:22:50Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [86.91360597169563]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We propose an amortized feature extraction and memory-augmentation approach to compress and extract information from new documents.
Our experiment demonstrates the superiority of MAC in multiple aspects, including online adaptation performance, time, and memory efficiency.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - A Comprehensive Trusted Runtime for WebAssembly with Intel SGX [2.6732136954707792]
We present Twine, a trusted runtime for running WebAssembly-compiled applications within TEEs.
It extends the standard WebAssembly system interface (WASI), providing controlled OS services, focusing on I/O.
We evaluate its performance using general-purpose benchmarks and real-world applications, showing it compares on par with state-of-the-art solutions.
arXiv Detail & Related papers (2023-12-14T16:19:00Z) - MemGPT: Towards LLMs as Operating Systems [50.02623936965231]
Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows.
We propose virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems.
We release MemGPT code and data for our experiments at https://memgpt.ai.
arXiv Detail & Related papers (2023-10-12T17:51:32Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference [23.207326766883405]
Mixture-of-Experts (MoE) is able to scale its model size without proportionally scaling up its computational requirements.
Pre-gated MoE employs our novel pre-gating function which alleviates the dynamic nature of sparse expert activation.
We demonstrate that Pre-gated MoE is able to improve performance, reduce GPU memory consumption, while also maintaining the same level of model quality.
arXiv Detail & Related papers (2023-08-23T11:25:37Z) - Full Parameter Fine-tuning for Large Language Models with Limited Resources [55.794732214059806]
Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training.
We propose a new computation, LOw-Memory Optimization (LOMO), which fuses the gradient and the parameter update in one step to reduce memory usage.
arXiv Detail & Related papers (2023-06-16T11:37:15Z) - MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware.
Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters.
We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z) - Towards Faster Reasoners By Using Transparent Huge Pages [0.491574468325115]
In this work, we present an approach to reduce the runtime of AR tools by 10% on average and up to 20% for long running tasks.
Our improvement addresses the high memory usage that comes with the data structures used in AR tools, which are based on conflict driven no-good learning.
arXiv Detail & Related papers (2020-04-29T17:57:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.