Related papers: Enabling Homomorphically Encrypted Inference for Large DNN Models

Enabling Homomorphically Encrypted Inference for Large DNN Models

URL: http://arxiv.org/abs/2103.16139v1
Date: Tue, 30 Mar 2021 07:53:34 GMT
Title: Enabling Homomorphically Encrypted Inference for Large DNN Models
Authors: Guillermo Lloret-Talavera, Marc Jorda, Harald Servat, Fabian Boemer, Chetan Chauhan, Shigeki Tomishima, Nilesh N. Shah, Antonio J. Pe\~na
Abstract summary: Homomorphic encryption (HE) enables inference using encrypted data but it incurs 100x--10,000x memory and runtime overheads. Secure deep neural network (DNN) inference using HE is currently limited by computing and memory resources. We explore the feasibility of leveraging hybrid memory systems comprised of DRAM and persistent memory.
Score: 1.0679692136113117
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The proliferation of machine learning services in the last few years has raised data privacy concerns. Homomorphic encryption (HE) enables inference using encrypted data but it incurs 100x--10,000x memory and runtime overheads. Secure deep neural network (DNN) inference using HE is currently limited by computing and memory resources, with frameworks requiring hundreds of gigabytes of DRAM to evaluate small models. To overcome these limitations, in this paper we explore the feasibility of leveraging hybrid memory systems comprised of DRAM and persistent memory. In particular, we explore the recently-released Intel Optane PMem technology and the Intel HE-Transformer nGraph to run large neural networks such as MobileNetV2 (in its largest variant) and ResNet-50 for the first time in the literature. We present an in-depth analysis of the efficiency of the executions with different hardware and software configurations. Our results conclude that DNN inference using HE incurs on friendly access patterns for this memory configuration, yielding efficient executions.

Related papers

Memory-Optimized Once-For-All Network [5.008189006630566]
Memory-limited OFA (MOOFA) supernet is designed to maximize memory usage across different configurations. Our code is available at https://github.com/MaximeGirard/memory-optimized-once-for-all.
arXiv Detail & Related papers (2024-09-05T20:06:33Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model [55.116403765330084]
Current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. We propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion. We experimentally validate our solution with 180 nm resistive memory in-memory computing macros.
arXiv Detail & Related papers (2024-04-08T16:34:35Z)
SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget [18.63754969602021]
Deep neural networks (DNNs) on edge artificial intelligence (AI) devices enable various autonomous mobile computing applications. Existing solutions, such as model compression or cloud offloading, reduce the memory footprint of DNN inference. We develop SwapNet, an efficient block swapping ecosystem for edge AI devices.
arXiv Detail & Related papers (2024-01-30T05:29:49Z)
Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques. PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z)
Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing. Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z)
SmartSAGE: Training Large-scale Graph Neural Networks using In-Storage Processing Architectures [0.7792020418343023]
Graph neural networks (GNNs) can extract features by learning both the representation of each objects (i.e., graph nodes) and the relationship across different objects. Despite its strengths, utilizing these algorithms in a production environment faces several challenges as the number of graph nodes and edges amount to several billions to hundreds of billions scale. In this work, we first conduct a detailed characterization on a state-of-the-art, large-scale GNN training algorithm, GraphAGES. Based on the characterization, we then explore the feasibility of utilizing capacity-optimized NVM for storing
arXiv Detail & Related papers (2022-05-10T07:25:30Z)
Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation [10.563649948220371]
Deep neural networks (DNN) have shown superior performance in a variety of tasks. As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices. We propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations.
arXiv Detail & Related papers (2021-08-25T18:50:24Z)
PIM-DRAM:Accelerating Machine Learning Workloads using Processing in Memory based on DRAM Technology [2.6168147530506958]
We propose a processing-in-memory (PIM) multiplication primitive to accelerate matrix vector operations in ML workloads. We show that the proposed architecture, mapping, and data flow can provide up to 23x and 6.5x benefits over a GPU.
arXiv Detail & Related papers (2021-05-08T16:39:24Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.