Plinius: Secure and Persistent Machine Learning Model Training
- URL: http://arxiv.org/abs/2104.02987v2
- Date: Thu, 8 Apr 2021 06:03:57 GMT
- Title: Plinius: Secure and Persistent Machine Learning Model Training
- Authors: Peterson Yuhala, Pascal Felber, Valerio Schiavoni, Alain Tchana
- Abstract summary: Persistent memory (PM) is resilient to power loss (unlike DRAM)
We present PLINIUS, a framework using Intel SGX enclaves for secure training of ML models and PM for fault tolerance guarantees.
- Score: 2.1375296464337086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the increasing popularity of cloud based machine learning (ML)
techniques there comes a need for privacy and integrity guarantees for ML data.
In addition, the significant scalability challenges faced by DRAM coupled with
the high access-times of secondary storage represent a huge performance
bottleneck for ML systems. While solutions exist to tackle the security aspect,
performance remains an issue. Persistent memory (PM) is resilient to power loss
(unlike DRAM), provides fast and fine-granular access to memory (unlike disk
storage) and has latency and bandwidth close to DRAM (in the order of ns and
GB/s, respectively). We present PLINIUS, a ML framework using Intel SGX
enclaves for secure training of ML models and PM for fault tolerance
guarantees. P LINIUS uses a novel mirroring mechanism to create and maintain
(i) encrypted mirror copies of ML models on PM, and (ii) encrypted training
data in byte-addressable PM, for near-instantaneous data recovery after a
system failure. Compared to disk-based checkpointing systems,PLINIUS is 3.2x
and 3.7x faster respectively for saving and restoring models on real PM
hardware, achieving robust and secure ML model training in SGX enclaves.
Related papers
- SLIP: Securing LLMs IP Using Weights Decomposition [0.0]
Large language models (LLMs) have recently seen widespread adoption, in both academia and industry.
As these models grow, they become valuable intellectual property (IP), reflecting enormous investments by their owners.
Current methods to protect models' IP on the edge have limitations in terms of practicality, loss in accuracy, or suitability to requirements.
We introduce a novel hybrid inference algorithm, named SLIP, designed to protect edge-deployed models from theft.
arXiv Detail & Related papers (2024-07-15T16:37:55Z) - InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management [0.5899781520375794]
Transformer-based large language models (LLMs) demonstrate impressive performance across various natural language processing tasks.
serving inference for generating long contents poses a challenge due to the enormous memory footprint of the transient state.
InfiniGen is a novel KV cache management framework tailored for long-text generation.
arXiv Detail & Related papers (2024-06-28T07:41:26Z) - PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN [19.014325509263536]
ChatGPT marks the arrival of the large language model (LLM) era.
PermLLM achieves two-party private inference of the ChatGLM-6B model at the speed of around 3s/token.
arXiv Detail & Related papers (2024-05-29T04:06:50Z) - MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing knowledge capabilities by integrating a structured and explicit read-and-write memory module.
Our experiments indicate that MemLLM enhances performance and interpretability, in language modeling general and in particular.
We see MemLLM as an important step towards making LLMs more grounded and factual through memory augmentation.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - AI and Memory Wall [81.06494558184049]
We show how memory bandwidth can become the dominant bottleneck for decoder models.
We argue for a redesign in model architecture, training, and deployment strategies to overcome this memory limitation.
arXiv Detail & Related papers (2024-03-21T04:31:59Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [86.91360597169563]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We propose an amortized feature extraction and memory-augmentation approach to compress and extract information from new documents.
Our experiment demonstrates the superiority of MAC in multiple aspects, including online adaptation performance, time, and memory efficiency.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - LLM in a flash: Efficient Large Language Model Inference with Limited
Memory [20.515855044180295]
Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks.
This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity.
Our method involves constructing an inference cost model that takes into account the characteristics of flash memory.
arXiv Detail & Related papers (2023-12-12T18:57:08Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - A TinyML Platform for On-Device Continual Learning with Quantized Latent
Replays [66.62377866022221]
Latent Replay-based Continual Learning (CL) techniques enable online, serverless adaptation in principle.
We introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power processor.
Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory.
arXiv Detail & Related papers (2021-10-20T11:01:23Z) - S3ML: A Secure Serving System for Machine Learning Inference [15.994551402176189]
We present S3ML, a secure serving system for machine learning inference.
S3ML runs machine learning models in Intel SGX enclaves to protect users' privacy.
arXiv Detail & Related papers (2020-10-13T07:41:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.