Related papers: CompactTag: Minimizing Computation Overheads in Actively-Secure MPC for Deep Neural Networks

CompactTag: Minimizing Computation Overheads in Actively-Secure MPC for Deep Neural Networks

URL: http://arxiv.org/abs/2311.04406v1
Date: Wed, 8 Nov 2023 00:18:08 GMT
Title: CompactTag: Minimizing Computation Overheads in Actively-Secure MPC for Deep Neural Networks
Authors: Yongqin Wang, Pratik Sarkar, Nishat Koti, Arpita Patra, Murali Annavaram,
Abstract summary: We introduce CompactTag, a lightweight algorithm for generating MAC tags specifically tailored for linear layers in machine learning (ML) applications. CompactTag speeds up this tag computation bottleneck by up to 23x, resulting in up to 1.47x total online phase runtime speedups for various ML workloads.
Score: 16.39761637882153
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Secure Multiparty Computation (MPC) protocols enable secure evaluation of a circuit by several parties, even in the presence of an adversary who maliciously corrupts all but one of the parties. These MPC protocols are constructed using the well-known secret-sharing-based paradigm (SPDZ and SPDZ2k), where the protocols ensure security against a malicious adversary by computing Message Authentication Code (MAC) tags on the input shares and then evaluating the circuit with these input shares and tags. However, this tag computation adds a significant runtime overhead, particularly for machine learning (ML) applications with numerous linear computation layers such as convolutions and fully connected layers. To alleviate the tag computation overhead, we introduce CompactTag, a lightweight algorithm for generating MAC tags specifically tailored for linear layers in ML. Linear layer operations in ML, including convolutions, can be transformed into Toeplitz matrix multiplications. For the multiplication of two matrices with dimensions T1 x T2 and T2 x T3 respectively, SPDZ2k required O(T1 x T2 x T3) local multiplications for the tag computation. In contrast, CompactTag only requires O(T1 x T2 + T1 x T3 + T2 x T3) local multiplications, resulting in a substantial performance boost for various ML models. We empirically compared our protocol to the SPDZ2k protocol for various ML circuits, including ResNet Training-Inference, Transformer Training-Inference, and VGG16 Training-Inference. SPDZ2k dedicated around 30% of its online runtime for tag computation. CompactTag speeds up this tag computation bottleneck by up to 23x, resulting in up to 1.47x total online phase runtime speedups for various ML workloads.

Related papers

DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing [4.589472292598182]
Fine-tuning large language models (LLMs) remains resource-intensive due to their sheer scale.<n>We present DistZO2, a memory-efficient framework for distributed zeroth-order fine-tuning of LLMs.
arXiv Detail & Related papers (2025-07-03T22:53:34Z)
Enhancing MOTION2NX for Efficient, Scalable and Secure Image Inference using Convolutional Neural Networks [4.407841002228536]
We use the ABY2.0 SMPC protocol implemented on the C++ based MOTION2NX framework for secure convolutional neural network (CNN) inference application with semi-honest security. We also present a novel splitting algorithm that divides the computations at each CNN layer into multiple chunks.
arXiv Detail & Related papers (2024-08-29T09:50:21Z)
Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask [74.64216073678617]
AMD performs parallel NAR inference within contiguous blocks of output labels concealed using attention masks. A beam search algorithm is designed to leverage a dynamic fusion of CTC, AR Decoder, and AMD probabilities. Experiments on the LibriSpeech-100hr corpus suggest the tripartite Decoder incorporating the AMD module produces a maximum decoding speed-up ratio of 1.73x.
arXiv Detail & Related papers (2024-06-14T13:42:38Z)
Efficient Transformer Encoders for Mask2Former-style models [57.54752243522298]
ECO-M2F is a strategy to self-select the number of hidden layers in the encoder conditioned on the input image. The proposed approach reduces expected encoder computational cost while maintaining performance. It is flexible in architecture configurations, and can be extended beyond the segmentation task to object detection.
arXiv Detail & Related papers (2024-04-23T17:26:34Z)
Extreme Compression of Large Language Models via Additive Quantization [59.3122859349777]
Our algorithm, called AQLM, generalizes the classic Additive Quantization (AQ) approach for information retrieval. We provide fast GPU and CPU implementations of AQLM for token generation, which enable us to match or outperform optimized FP16 implementations for speed.
arXiv Detail & Related papers (2024-01-11T18:54:44Z)
Secure and Efficient Two-party Quantum Scalar Product Protocol With Application to Privacy-preserving Matrix Multiplication [2.770988618353868]
Two-party quantum scalar product (S2SP) is a promising research area within secure multiparty computation (SMC) Existing quantum S2SP protocols are not efficient enough, and the complexity is usually close to exponential level. In this paper, a novel secure two-party quantum scalar product (S2QSP) protocol based on Fourier states is proposed to achieve higher efficiency.
arXiv Detail & Related papers (2023-09-23T14:33:46Z)
AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task Learning [1.4963011898406864]
We introduce AdaMTL, an adaptive framework that learns task-aware inference policies for multi-task learning models. AdaMTL reduces the computational complexity by 43% while improving the accuracy by 1.32% compared to single-task models. When deployed on Vuzix M4000 smart glasses, AdaMTL reduces the inference latency and the energy consumption by up to 21.8% and 37.5%, respectively.
arXiv Detail & Related papers (2023-04-17T20:17:44Z)
MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference [5.7203077366666015]
We show that it is possible to carefully orchestrate the computation and communication steps to overlap. We propose MPC-Pipe, an efficient MPC system for both training and inference of ML workloads.
arXiv Detail & Related papers (2022-09-27T19:16:26Z)
Lightweight and Progressively-Scalable Networks for Semantic Segmentation [100.63114424262234]
Multi-scale learning frameworks have been regarded as a capable class of models to boost semantic segmentation. In this paper, we thoroughly analyze the design of convolutional blocks and the ways of interactions across multiple scales. We devise Lightweight and Progressively-Scalable Networks (LPS-Net) that novelly expands the network complexity in a greedy manner.
arXiv Detail & Related papers (2022-07-27T16:00:28Z)
Block-Recurrent Transformers [49.07682696216708]
We introduce the Block-Recurrent Transformer, which applies a transformer layer in a recurrent fashion along a sequence. Our recurrent cell operates on blocks of tokens rather than single tokens, and leverages parallel computation within a block in order to make efficient use of accelerator hardware.
arXiv Detail & Related papers (2022-03-11T23:44:33Z)
HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation [26.67099154998755]
Multi-party computation (MPC) is a branch of cryptography where multiple non-colluding parties execute a protocol to securely compute a function. We study training and inference of neural networks under the MPC setup. We show that both of the approaches enjoy strong theoretical motivations and efficient computation under the MPC setup.
arXiv Detail & Related papers (2021-10-28T21:15:11Z)
Taurus: A Data Plane Architecture for Per-Packet ML [59.1343317736213]
We present the design and implementation of Taurus, a data plane for line-rate inference. Our evaluation of a Taurus switch ASIC shows that Taurus operates orders of magnitude faster than a server-based control plane.
arXiv Detail & Related papers (2020-02-12T09:18:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.