Plinius: Secure and Persistent Machine Learning Model Training
- URL: http://arxiv.org/abs/2104.02987v2
- Date: Thu, 8 Apr 2021 06:03:57 GMT
- Title: Plinius: Secure and Persistent Machine Learning Model Training
- Authors: Peterson Yuhala, Pascal Felber, Valerio Schiavoni, Alain Tchana
- Abstract summary: Persistent memory (PM) is resilient to power loss (unlike DRAM)
We present PLINIUS, a framework using Intel SGX enclaves for secure training of ML models and PM for fault tolerance guarantees.
- Score: 2.1375296464337086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the increasing popularity of cloud based machine learning (ML)
techniques there comes a need for privacy and integrity guarantees for ML data.
In addition, the significant scalability challenges faced by DRAM coupled with
the high access-times of secondary storage represent a huge performance
bottleneck for ML systems. While solutions exist to tackle the security aspect,
performance remains an issue. Persistent memory (PM) is resilient to power loss
(unlike DRAM), provides fast and fine-granular access to memory (unlike disk
storage) and has latency and bandwidth close to DRAM (in the order of ns and
GB/s, respectively). We present PLINIUS, a ML framework using Intel SGX
enclaves for secure training of ML models and PM for fault tolerance
guarantees. P LINIUS uses a novel mirroring mechanism to create and maintain
(i) encrypted mirror copies of ML models on PM, and (ii) encrypted training
data in byte-addressable PM, for near-instantaneous data recovery after a
system failure. Compared to disk-based checkpointing systems,PLINIUS is 3.2x
and 3.7x faster respectively for saving and restoring models on real PM
hardware, achieving robust and secure ML model training in SGX enclaves.
Related papers
- DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms.
We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM.
DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z) - Mixture of Attentions For Speculative Decoding [17.344416130742232]
Speculative decoding (SD) leverages smaller models to efficiently propose future tokens, which are then verified by the Large Language Models in parallel.
We identify several limitations of SD models including the lack of on-policyness during training and partial observability.
We propose a more grounded architecture for small models by introducing a Mixture of Attentions for SD.
arXiv Detail & Related papers (2024-10-04T10:25:52Z) - MiniCPM-V: A GPT-4V Level MLLM on Your Phone [83.10007643273521]
MiniCPM-V is a series of efficient MLLMs deployable on end-side devices.
By integrating the latest MLLM techniques in architecture, pretraining and alignment, MiniCPM-V 2.5 has several notable features.
MiniCPM-V can be viewed as a representative example of a promising trend.
arXiv Detail & Related papers (2024-08-03T15:02:21Z) - SLIP: Securing LLMs IP Using Weights Decomposition [0.0]
Large language models (LLMs) have recently seen widespread adoption, in both academia and industry.
As these models grow, they become valuable intellectual property (IP), reflecting enormous investments by their owners.
Current methods to protect models' IP on the edge have limitations in terms of practicality, loss in accuracy, or suitability to requirements.
We introduce a novel hybrid inference algorithm, named SLIP, designed to protect edge-deployed models from theft.
arXiv Detail & Related papers (2024-07-15T16:37:55Z) - PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN [19.014325509263536]
ChatGPT marks the arrival of the large language model (LLM) era.
PermLLM achieves two-party private inference of the ChatGLM-6B model at the speed of around 3s/token.
arXiv Detail & Related papers (2024-05-29T04:06:50Z) - MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module.
Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular.
We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - AI and Memory Wall [81.06494558184049]
We show how memory bandwidth can become the dominant bottleneck for decoder models.
We argue for a redesign in model architecture, training, and deployment strategies to overcome this memory limitation.
arXiv Detail & Related papers (2024-03-21T04:31:59Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - LLM in a flash: Efficient Large Language Model Inference with Limited Memory [19.668719251238176]
Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks.
This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity.
Our method involves constructing an inference cost model that takes into account the characteristics of flash memory.
arXiv Detail & Related papers (2023-12-12T18:57:08Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - S3ML: A Secure Serving System for Machine Learning Inference [15.994551402176189]
We present S3ML, a secure serving system for machine learning inference.
S3ML runs machine learning models in Intel SGX enclaves to protect users' privacy.
arXiv Detail & Related papers (2020-10-13T07:41:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.