Related papers: Secure and Storage-Efficient Deep Learning Models for Edge AI Using Automatic Weight Generation

Secure and Storage-Efficient Deep Learning Models for Edge AI Using Automatic Weight Generation

URL: http://arxiv.org/abs/2507.06380v1
Date: Tue, 08 Jul 2025 20:33:02 GMT
Title: Secure and Storage-Efficient Deep Learning Models for Edge AI Using Automatic Weight Generation
Authors: Habibur Rahaman, Atri Chatterjee, Swarup Bhunia,
Abstract summary: WINGs is a novel framework that dynamically generates layer weights in a fully connected neural network (FC)<n>It compresses the weights in convolutional neural networks (CNNs) during inference, significantly reducing memory requirements without sacrificing accuracy.<n>The sensitivity-aware design also offers an added level of security, as any bit-flip attack with weights in compressed layers has an amplified and readily detectable effect on accuracy.
Score: 5.097354139604596
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Complex neural networks require substantial memory to store a large number of synaptic weights. This work introduces WINGs (Automatic Weight Generator for Secure and Storage-Efficient Deep Learning Models), a novel framework that dynamically generates layer weights in a fully connected neural network (FC) and compresses the weights in convolutional neural networks (CNNs) during inference, significantly reducing memory requirements without sacrificing accuracy. WINGs framework uses principal component analysis (PCA) for dimensionality reduction and lightweight support vector regression (SVR) models to predict layer weights in the FC networks, removing the need for storing full-weight matrices and achieving substantial memory savings. It also preferentially compresses the weights in low-sensitivity layers of CNNs using PCA and SVR with sensitivity analysis. The sensitivity-aware design also offers an added level of security, as any bit-flip attack with weights in compressed layers has an amplified and readily detectable effect on accuracy. WINGs achieves 53x compression for the FC layers and 28x for AlexNet with MNIST dataset, and 18x for Alexnet with CIFAR-10 dataset with 1-2% accuracy loss. This significant reduction in memory results in higher throughput and lower energy for DNN inference, making it attractive for resource-constrained edge applications.

Related papers

"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z)
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation. Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Compact Multi-level Sparse Neural Networks with Input Independent Dynamic Rerouting [33.35713740886292]
Sparse deep neural networks can substantially reduce the complexity and memory consumption of the models. Facing the real-life challenges, we propose to train a sparse model that supports multiple sparse levels. In this way, one can dynamically select the appropriate sparsity level during inference, while the storage cost is capped by the least sparse sub-model.
arXiv Detail & Related papers (2021-12-21T01:35:51Z)
Nonlinear Tensor Ring Network [39.89070144585793]
State-of-the-art deep neural networks (DNNs) have been widely applied for various real-world applications, and achieved significant performance for cognitive problems. By converting redundant models into compact ones, compression technique appears to be a practical solution to reducing the storage and memory consumption. In this paper, we develop a nonlinear tensor ring network (NTRN) in which both fullyconnected and convolutional layers are compressed.
arXiv Detail & Related papers (2021-11-12T02:02:55Z)
DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels. We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence [32.03465747357384]
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN.
arXiv Detail & Related papers (2021-05-15T00:10:12Z)
Lightweight compression of neural network feature tensors for collaborative intelligence [32.03465747357384]
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a relatively low-complexity device such as a mobile phone or edge device. This paper presents a novel lightweight compression technique designed specifically to code the activations of a split DNN layer.
arXiv Detail & Related papers (2021-05-12T23:41:35Z)
Hessian Aware Quantization of Spiking Neural Networks [1.90365714903665]
Neuromorphic architecture allows massively parallel computation with variable and local bit-precisions. Current gradient based methods of SNN training use a complex neuron model with multiple state variables. We present a simplified neuron model that reduces the number of state variables by 4-fold while still being compatible with gradient based training.
arXiv Detail & Related papers (2021-04-29T05:27:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.