Tabula: Efficiently Computing Nonlinear Activation Functions for Secure Neural Network Inference
- URL: http://arxiv.org/abs/2203.02833v2
- Date: Sun, 16 Jun 2024 23:24:01 GMT
- Title: Tabula: Efficiently Computing Nonlinear Activation Functions for Secure Neural Network Inference
- Authors: Maximilian Lam, Michael Mitzenmacher, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks,
- Abstract summary: Multiparty approaches to secure neural network inference commonly rely on garbled circuits.
We propose Tabula, an algorithm based on secure lookup tables.
Compared to garbled circuits with 8-bit quantized inputs, Tabula with 8-bit activations uses between $280$-$560 times$ less communication.
- Score: 18.363580113885174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multiparty computation approaches to secure neural network inference commonly rely on garbled circuits for securely executing nonlinear activation functions. However, garbled circuits require excessive communication between server and client, impose significant storage overheads, and incur large runtime penalties. To reduce these costs, we propose an alternative to garbled circuits: Tabula, an algorithm based on secure lookup tables. Our approach precomputes lookup tables during an offline phase that contains the result of all possible nonlinear function calls. Because these tables incur exponential storage costs in the number of operands and the precision of the input values, we use quantization to reduce these storage costs to make this approach practical. This enables an online phase where securely computing the result of a nonlinear function requires just a single round of communication, with communication cost equal to twice the number of bits of the input to the nonlinear function. In practice our approach costs 2 bytes of communication per nonlinear function call in the online phase. Compared to garbled circuits with 8-bit quantized inputs, when computing individual nonlinear functions during the online phase, experiments show Tabula with 8-bit activations uses between $280$-$560 \times$ less communication, is over $100\times$ faster, and uses a comparable (within a factor of 2) amount of storage; compared against other state-of-the-art protocols Tabula achieves greater than $40\times$ communication reduction. This leads to significant performance gains over garbled circuits with quantized inputs during the online phase of secure inference of neural networks: Tabula reduces end-to-end inference communication by up to $9 \times$ and achieves an end-to-end inference speedup of up to $50 \times$, while imposing comparable storage and offline preprocessing costs.
Related papers
- Fast Flux-Activated Leakage Reduction for Superconducting Quantum
Circuits [84.60542868688235]
leakage out of the computational subspace arising from the multi-level structure of qubit implementations.
We present a resource-efficient universal leakage reduction unit for superconducting qubits using parametric flux modulation.
We demonstrate that using the leakage reduction unit in repeated weight-two stabilizer measurements reduces the total number of detected errors in a scalable fashion.
arXiv Detail & Related papers (2023-09-13T16:21:32Z) - Circuit Cutting with Non-Maximally Entangled States [59.11160990637615]
Distributed quantum computing combines the computational power of multiple devices to overcome the limitations of individual devices.
circuit cutting techniques enable the distribution of quantum computations through classical communication.
Quantum teleportation allows the distribution of quantum computations without an exponential increase in shots.
We propose a novel circuit cutting technique that leverages non-maximally entangled qubit pairs.
arXiv Detail & Related papers (2023-06-21T08:03:34Z) - Low-Latency Online Multiplier with Reduced Activities and Minimized
Interconnect for Inner Product Arrays [0.8078491757252693]
This paper proposes a low latency multiplier based on online or left-to-right arithmetic.
Online arithmetic enables overlapping successive operations regardless of data dependency.
Serial nature of the online algorithm and gradual increment/decrement of active slices minimize the interconnects and signal activities.
arXiv Detail & Related papers (2023-04-06T01:22:27Z) - Unsupervised Optimal Power Flow Using Graph Neural Networks [172.33624307594158]
We use a graph neural network to learn a nonlinear parametrization between the power demanded and the corresponding allocation.
We show through simulations that the use of GNNs in this unsupervised learning context leads to solutions comparable to standard solvers.
arXiv Detail & Related papers (2022-10-17T17:30:09Z) - Erasure qubits: Overcoming the $T_1$ limit in superconducting circuits [105.54048699217668]
amplitude damping time, $T_phi$, has long stood as the major factor limiting quantum fidelity in superconducting circuits.
We propose a scheme for overcoming the conventional $T_phi$ limit on fidelity by designing qubits in a way that amplitude damping errors can be detected and converted into erasure errors.
arXiv Detail & Related papers (2022-08-10T17:39:21Z) - Circuit knitting with classical communication [1.8311368766923968]
We study a method of circuit knitting based on quasiprobability simulation of nonlocal gates with operations that act locally on the subcircuits.
We provide a positive answer by showing that for circuits containing $n$ nonlocal CNOT gates connecting two circuit parts, the simulation overhead can be reduced from $O(9n)$ to $O(4n)$ if one allows for classical information exchange.
arXiv Detail & Related papers (2022-04-29T18:00:11Z) - Multiplier with Reduced Activities and Minimized Interconnect for Inner
Product Arrays [0.8078491757252693]
We present a pipelined multiplier with reduced activities and minimized interconnect based on online digit-serial arithmetic.
For $8, 16, 24$ and $32$ bit precision, the proposed low power pipelined design show upto $38%$ and $44%$ reduction in power and area respectively.
arXiv Detail & Related papers (2022-04-11T05:45:43Z) - ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training [65.68511423300812]
We propose ProgFed, a progressive training framework for efficient and effective federated learning.
ProgFed inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models.
Our results show that ProgFed converges at the same rate as standard training on full models.
arXiv Detail & Related papers (2021-10-11T14:45:00Z) - Scaling the Convex Barrier with Sparse Dual Algorithms [141.4085318878354]
We present two novel dual algorithms for tight and efficient neural network bounding.
Both methods recover the strengths of the new relaxation: tightness and a linear separation oracle.
We can obtain better bounds than off-the-shelf solvers in only a fraction of their running time.
arXiv Detail & Related papers (2021-01-14T19:45:17Z) - TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network
Training and Inference [3.238873941995477]
Dash is a hardware level technique for enabling data-parallel MAC units to take advantage of sparsity in their input operand streams.
When used to compose a hardware accelerator for deep learning,Dash can speedup the training process while also increasing energy efficiency.
arXiv Detail & Related papers (2020-09-01T23:39:35Z) - A sparse code increases the speed and efficiency of neuro-dynamic
programming for optimal control tasks with correlated inputs [0.0]
A sparse code is used to represent natural images in an optimal control task solved with neuro-dynamic programming.
A 2.25 times over-complete sparse code is shown to at least double memory capacity compared with a complete sparse code using the same input.
This is used in sequential learning to store a potentially large number of optimal control tasks in the network.
arXiv Detail & Related papers (2020-06-22T01:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.