Confidential Machine Learning within Graphcore IPUs
- URL: http://arxiv.org/abs/2205.09005v2
- Date: Fri, 20 May 2022 12:07:04 GMT
- Title: Confidential Machine Learning within Graphcore IPUs
- Authors: Kapil Vaswani, Stavros Volos, C\'edric Fournet, Antonio Nino Diaz, Ken
Gordon, Balaji Vembu, Sam Webster, David Chisnall, Saurabh Kulkarni, Graham
Cunningham, Richard Osborne, Dan Wilkinson
- Abstract summary: Graphcore's GC200 IPU taped out at TSMC's 7nm technology node.
ITX enables the execution of AI workloads with strong confidentiality and integrity guarantees at low performance overheads.
- Score: 1.8657490510210906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present IPU Trusted Extensions (ITX), a set of experimental hardware
extensions that enable trusted execution environments in Graphcore's AI
accelerators.
ITX enables the execution of AI workloads with strong confidentiality and
integrity guarantees at low performance overheads. ITX isolates workloads from
untrusted hosts, and ensures their data and models remain encrypted at all
times except within the IPU. ITX includes a hardware root-of-trust that
provides attestation capabilities and orchestrates trusted execution, and
on-chip programmable cryptographic engines for authenticated encryption of code
and data at PCIe bandwidth. We also present software for ITX in the form of
compiler and runtime extensions that support multi-party training without
requiring a CPU-based TEE.
Experimental support for ITX is included in Graphcore's GC200 IPU taped out
at TSMC's 7nm technology node. Its evaluation on a development board using
standard DNN training workloads suggests that ITX adds less than 5% performance
overhead, and delivers up to 17x better performance compared to CPU-based
confidential computing systems relying on AMD SEV-SNP.
Related papers
- Fastrack: Fast IO for Secure ML using GPU TEEs [7.758531952461963]
GPU-based Trusted Execution Environments (TEEs) offer secure, high-performance solutions.
CPU-to-GPU communication overheads significantly hinder performance.
This paper analyzes Nvidia H100 TEE protocols and identifies three key overheads.
We propose Fastrack, optimizing with 1) direct GPU TEE communication, 2) parallelized authentication, and 3) overlapping decryption with PCI-e transmission.
arXiv Detail & Related papers (2024-10-20T01:00:33Z) - Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads [1.8633238548765558]
Cloud workloads have dominated generative AI based on large language models (LLM)
Specialized hardware accelerators, such as GPUs, NPUs, and TPUs, play a key role in AI adoption due to their superior performance over general-purpose CPUs.
The AI models and the data are often highly sensitive and come from mutually distrusting parties.
We propose Ascend-CC, a confidential computing architecture based on discrete NPU devices that requires no trust in the host system.
arXiv Detail & Related papers (2024-07-16T16:17:28Z) - Assessing the Performance of OpenTitan as Cryptographic Accelerator in Secure Open-Hardware System-on-Chips [4.635794094881707]
OpenTitan is an open-source silicon root-of-trust designed to be deployed in a wide range of systems.
There has been no accurate and quantitative establishment of the benefits derived from using OpenTitan as a secure accelerator.
This paper addresses this gap by thoroughly analysing strengths and inefficiencies when offloading cryptographic workloads to OpenTitan.
arXiv Detail & Related papers (2024-02-16T01:35:40Z) - HasTEE+ : Confidential Cloud Computing and Analytics with Haskell [50.994023665559496]
Confidential computing enables the protection of confidential code and data in a co-tenanted cloud deployment using specialized hardware isolation units called Trusted Execution Environments (TEEs)
TEEs offer low-level C/C++-based toolchains that are susceptible to inherent memory safety vulnerabilities and lack language constructs to monitor explicit and implicit information-flow leaks.
We address the above with HasTEE+, a domain-specific language (cla) embedded in Haskell that enables programming TEEs in a high-level language with strong type-safety.
arXiv Detail & Related papers (2024-01-17T00:56:23Z) - FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems [61.335229621081346]
Federated Learning (FL) has become a viable technique for realizing privacy-enhancing distributed deep learning on the network edge.
In this paper, we propose FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities.
arXiv Detail & Related papers (2023-06-08T13:11:20Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - Building Your Own Trusted Execution Environments Using FPGA [16.206300249987354]
BYOTee (Build Your Own Trusted Execution Environments) is an easy-to-use infrastructure for building multiple equally secure enclaves.
BYOTee creates enclaves with customized hardware TCBs, which include softcore CPUs, block RAMs, and peripheral connections, in FPGA on demand.
arXiv Detail & Related papers (2022-03-08T17:22:52Z) - MAPLE: Microprocessor A Priori for Latency Estimation [81.91509153539566]
Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption.
Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process.
We propose Microprocessor A Priori for Estimation Estimation MAPLE that does not rely on transfer learning or domain adaptation.
arXiv Detail & Related papers (2021-11-30T03:52:15Z) - Perun: Secure Multi-Stakeholder Machine Learning Framework with GPU
Support [1.5362025549031049]
Perun is a framework for confidential multi-stakeholder machine learning.
It executes ML training on hardware accelerators (e.g., GPU) while providing security guarantees.
During the ML training on CIFAR-10 and real-world medical datasets, Perun achieved a 161x to 1560x speedup.
arXiv Detail & Related papers (2021-03-31T08:31:07Z) - Optimizing Deep Learning Recommender Systems' Training On CPU Cluster
Architectures [56.69373580921888]
We focus on Recommender Systems which account for most of the AI cycles in cloud computing centers.
By enabling it to run on latest CPU hardware and software tailored for HPC, we are able to achieve more than two-orders of magnitude improvement in performance.
arXiv Detail & Related papers (2020-05-10T14:40:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.