Related papers: Confidential Machine Learning within Graphcore IPUs

Confidential Machine Learning within Graphcore IPUs

URL: http://arxiv.org/abs/2205.09005v2
Date: Fri, 20 May 2022 12:07:04 GMT
Title: Confidential Machine Learning within Graphcore IPUs
Authors: Kapil Vaswani, Stavros Volos, C\'edric Fournet, Antonio Nino Diaz, Ken Gordon, Balaji Vembu, Sam Webster, David Chisnall, Saurabh Kulkarni, Graham Cunningham, Richard Osborne, Dan Wilkinson
Abstract summary: Graphcore's GC200 IPU taped out at TSMC's 7nm technology node. ITX enables the execution of AI workloads with strong confidentiality and integrity guarantees at low performance overheads.
Score: 1.8657490510210906
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present IPU Trusted Extensions (ITX), a set of experimental hardware extensions that enable trusted execution environments in Graphcore's AI accelerators. ITX enables the execution of AI workloads with strong confidentiality and integrity guarantees at low performance overheads. ITX isolates workloads from untrusted hosts, and ensures their data and models remain encrypted at all times except within the IPU. ITX includes a hardware root-of-trust that provides attestation capabilities and orchestrates trusted execution, and on-chip programmable cryptographic engines for authenticated encryption of code and data at PCIe bandwidth. We also present software for ITX in the form of compiler and runtime extensions that support multi-party training without requiring a CPU-based TEE. Experimental support for ITX is included in Graphcore's GC200 IPU taped out at TSMC's 7nm technology node. Its evaluation on a development board using standard DNN training workloads suggests that ITX adds less than 5% performance overhead, and delivers up to 17x better performance compared to CPU-based confidential computing systems relying on AMD SEV-SNP.

Related papers

Trusted Compute Units: A Framework for Chained Verifiable Computations [41.94295877935867]
This paper introduces the Trusted Compute Unit (TCU), a unifying framework that enables composable and interoperable computations across heterogeneous technologies. By enabling secure off-chain interactions without incurring on-chain confirmation delays or gas fees, TCUs significantly improve system performance and scalability.
arXiv Detail & Related papers (2025-04-22T09:01:55Z)
Fastrack: Fast IO for Secure ML using GPU TEEs [7.758531952461963]
GPU-based Trusted Execution Environments (TEEs) offer secure, high-performance solutions. CPU-to-GPU communication overheads significantly hinder performance. This paper analyzes Nvidia H100 TEE protocols and identifies three key overheads. We propose Fastrack, optimizing with 1) direct GPU TEE communication, 2) parallelized authentication, and 3) overlapping decryption with PCI-e transmission.
arXiv Detail & Related papers (2024-10-20T01:00:33Z)
Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads [1.8633238548765558]
Cloud workloads have dominated generative AI based on large language models (LLM) Specialized hardware accelerators, such as GPUs, NPUs, and TPUs, play a key role in AI adoption due to their superior performance over general-purpose CPUs. The AI models and the data are often highly sensitive and come from mutually distrusting parties. We propose Ascend-CC, a confidential computing architecture based on discrete NPU devices that requires no trust in the host system.
arXiv Detail & Related papers (2024-07-16T16:17:28Z)
Benchmarking Predictive Coding Networks -- Made Simple [48.652114040426625]
We tackle the problems of efficiency and scalability for predictive coding networks (PCNs) in machine learning. We propose a library, called PCX, that focuses on performance and simplicity, and use it to implement a large set of standard benchmarks. We perform extensive tests on such benchmarks using both existing algorithms for PCNs, as well as adaptations of other methods popular in the bio-plausible deep learning community.
arXiv Detail & Related papers (2024-07-01T10:33:44Z)
Assessing the Performance of OpenTitan as Cryptographic Accelerator in Secure Open-Hardware System-on-Chips [4.635794094881707]
OpenTitan is an open-source silicon root-of-trust designed to be deployed in a wide range of systems. There has been no accurate and quantitative establishment of the benefits derived from using OpenTitan as a secure accelerator. This paper addresses this gap by thoroughly analysing strengths and inefficiencies when offloading cryptographic workloads to OpenTitan.
arXiv Detail & Related papers (2024-02-16T01:35:40Z)
HasTEE+ : Confidential Cloud Computing and Analytics with Haskell [50.994023665559496]
Confidential computing enables the protection of confidential code and data in a co-tenanted cloud deployment using specialized hardware isolation units called Trusted Execution Environments (TEEs) TEEs offer low-level C/C++-based toolchains that are susceptible to inherent memory safety vulnerabilities and lack language constructs to monitor explicit and implicit information-flow leaks. We address the above with HasTEE+, a domain-specific language (cla) embedded in Haskell that enables programming TEEs in a high-level language with strong type-safety.
arXiv Detail & Related papers (2024-01-17T00:56:23Z)
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems [61.335229621081346]
Federated Learning (FL) has become a viable technique for realizing privacy-enhancing distributed deep learning on the network edge. In this paper, we propose FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities.
arXiv Detail & Related papers (2023-06-08T13:11:20Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process. We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z)
Building Your Own Trusted Execution Environments Using FPGA [16.206300249987354]
BYOTee (Build Your Own Trusted Execution Environments) is an easy-to-use infrastructure for building multiple equally secure enclaves. BYOTee creates enclaves with customized hardware TCBs, which include softcore CPUs, block RAMs, and peripheral connections, in FPGA on demand.
arXiv Detail & Related papers (2022-03-08T17:22:52Z)
MAPLE: Microprocessor A Priori for Latency Estimation [81.91509153539566]
Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. We propose Microprocessor A Priori for Estimation Estimation MAPLE that does not rely on transfer learning or domain adaptation.
arXiv Detail & Related papers (2021-11-30T03:52:15Z)
Perun: Secure Multi-Stakeholder Machine Learning Framework with GPU Support [1.5362025549031049]
Perun is a framework for confidential multi-stakeholder machine learning. It executes ML training on hardware accelerators (e.g., GPU) while providing security guarantees. During the ML training on CIFAR-10 and real-world medical datasets, Perun achieved a 161x to 1560x speedup.
arXiv Detail & Related papers (2021-03-31T08:31:07Z)
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures [56.69373580921888]
We focus on Recommender Systems which account for most of the AI cycles in cloud computing centers. By enabling it to run on latest CPU hardware and software tailored for HPC, we are able to achieve more than two-orders of magnitude improvement in performance.
arXiv Detail & Related papers (2020-05-10T14:40:16Z)
Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO [46.20949184826173]
This work focuses on the applicability of efficient low-level, GPU hardware-specific instructions to improve on existing computer vision algorithms. Especially non-maxima suppression and the subsequent feature selection are prominent contributors to the overall image processing latency.
arXiv Detail & Related papers (2020-03-30T14:16:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.