CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU
- URL: http://arxiv.org/abs/2104.10949v1
- Date: Thu, 22 Apr 2021 09:21:40 GMT
- Title: CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU
- Authors: Sijun Tan, Brian Knott, Yuan Tian, and David J. Wu
- Abstract summary: CryptGPU is a system for privacy-preserving machine learning that implements all operations on the GPU.
We introduce a new interface to embed cryptographic operations over secret-shared values into floating-point operations.
We show that our protocols achieve a 2x to 8x improvement in private inference and a 6x to 36x improvement for private training.
- Score: 8.633428365391666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce CryptGPU, a system for privacy-preserving machine learning that
implements all operations on the GPU (graphics processing unit). Just as GPUs
played a pivotal role in the success of modern deep learning, they are also
essential for realizing scalable privacy-preserving deep learning. In this
work, we start by introducing a new interface to losslessly embed cryptographic
operations over secret-shared values (in a discrete domain) into floating-point
operations that can be processed by highly-optimized CUDA kernels for linear
algebra. We then identify a sequence of "GPU-friendly" cryptographic protocols
to enable privacy-preserving evaluation of both linear and non-linear
operations on the GPU. Our microbenchmarks indicate that our private GPU-based
convolution protocol is over 150x faster than the analogous CPU-based protocol;
for non-linear operations like the ReLU activation function, our GPU-based
protocol is around 10x faster than its CPU analog.
With CryptGPU, we support private inference and private training on
convolutional neural networks with over 60 million parameters as well as handle
large datasets like ImageNet. Compared to the previous state-of-the-art, when
considering large models and datasets, our protocols achieve a 2x to 8x
improvement in private inference and a 6x to 36x improvement for private
training. Our work not only showcases the viability of performing secure
multiparty computation (MPC) entirely on the GPU to enable fast
privacy-preserving machine learning, but also highlights the importance of
designing new MPC primitives that can take full advantage of the GPU's
computing capabilities.
Related papers
- Benchmarking GPUs on SVBRDF Extractor Model [0.0]
In this work, we try to differentiate the performance of different GPUs on neural network models that operate on bigger input images (256x256)
In this work, we tried to differentiate the performance of different GPUs on neural network models that operate on bigger input images (256x256)
arXiv Detail & Related papers (2023-10-19T17:09:06Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens [57.354304637367555]
We present EVEREST, a surprisingly efficient MVA approach for video representation learning.
It finds tokens containing rich motion features and discards uninformative ones during both pre-training and fine-tuning.
Our method significantly reduces the computation and memory requirements of MVA.
arXiv Detail & Related papers (2022-11-19T09:57:01Z) - PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine [68.8204255655161]
Support Vector Machines (SVMs) are widely used in machine learning.
However, even modern and optimized implementations do not scale well for large non-trivial dense data sets on cutting-edge hardware.
PLSSVM can be used as a drop-in replacement for an LVM.
arXiv Detail & Related papers (2022-02-25T13:24:23Z) - AxoNN: An asynchronous, message-driven parallel framework for
extreme-scale deep learning [1.5301777464637454]
AxoNN is a parallel deep learning framework that exploits asynchrony and message-driven execution to schedule neural network operations on each GPU.
By using the CPU memory as a scratch space for offloading data periodically during training, AxoNN is able to reduce GPU memory consumption by four times.
arXiv Detail & Related papers (2021-10-25T14:43:36Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - Efficient and Generic 1D Dilated Convolution Layer for Deep Learning [52.899995651639436]
We introduce our efficient implementation of a generic 1D convolution layer covering a wide range of parameters.
It is optimized for x86 CPU architectures, in particular, for architectures containing Intel AVX-512 and AVX-512 BFloat16 instructions.
We demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets.
arXiv Detail & Related papers (2021-04-16T09:54:30Z) - Large Graph Convolutional Network Training with GPU-Oriented Data
Communication Architecture [19.2129567657739]
Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale graph-based recommender systems.
Current GCN training systems keep the feature table in host memory and rely on the CPU to collect sparse features.
This approach, however, puts tremendous pressure on host memory bandwidth and the CPU.
We propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory.
arXiv Detail & Related papers (2021-03-04T21:00:17Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - ARIANN: Low-Interaction Privacy-Preserving Deep Learning via Function
Secret Sharing [2.6228228854413356]
AriaNN is a low-interaction privacy-preserving framework for private neural network training and inference on sensitive data.
We design primitives for the building blocks of neural networks such as ReLU, MaxPool and BatchNorm.
We implement our framework as an extension to support n-party private federated learning.
arXiv Detail & Related papers (2020-06-08T13:40:27Z) - Out-of-Core GPU Gradient Boosting [0.0]
We show that much larger datasets can fit on a given GPU, without degrading model accuracy or training time.
This is the first out-of-core GPU implementation of gradient boosting.
arXiv Detail & Related papers (2020-05-19T00:41:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.