Gated Compression Layers for Efficient Always-On Models
- URL: http://arxiv.org/abs/2303.08970v1
- Date: Wed, 15 Mar 2023 22:46:22 GMT
- Title: Gated Compression Layers for Efficient Always-On Models
- Authors: Haiguang Li, Trausti Thormundsson, Ivan Poupyrev, Nicholas Gillian
- Abstract summary: We propose a novel Gated Compression layer that can be applied to transform existing neural network architectures into Gated Neural Networks.
We provide results across five public image and audio datasets that demonstrate the proposed Gated Compression layer effectively stops up to 96% of negative samples, compresses 97% of positive samples, while maintaining or improving model accuracy.
- Score: 1.5612040984769857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mobile and embedded machine learning developers frequently have to compromise
between two inferior on-device deployment strategies: sacrifice accuracy and
aggressively shrink their models to run on dedicated low-power cores; or
sacrifice battery by running larger models on more powerful compute cores such
as neural processing units or the main application processor. In this paper, we
propose a novel Gated Compression layer that can be applied to transform
existing neural network architectures into Gated Neural Networks. Gated Neural
Networks have multiple properties that excel for on-device use cases that help
significantly reduce power, boost accuracy, and take advantage of heterogeneous
compute cores. We provide results across five public image and audio datasets
that demonstrate the proposed Gated Compression layer effectively stops up to
96% of negative samples, compresses 97% of positive samples, while maintaining
or improving model accuracy.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Enhancing User Experience in On-Device Machine Learning with Gated Compression Layers [0.0]
On-device machine learning (ODML) enables powerful edge applications, but power consumption remains a key challenge for resource-constrained devices.
This work focuses on the use of Gated Compression (GC) layer to enhance ODML model performance while conserving power.
GC layers dynamically regulate data flow by selectively gating activations of neurons within the neural network and effectively filtering out non-essential inputs.
arXiv Detail & Related papers (2024-05-02T21:18:06Z) - NCTV: Neural Clamping Toolkit and Visualization for Neural Network
Calibration [66.22668336495175]
A lack of consideration for neural network calibration will not gain trust from humans.
We introduce the Neural Clamping Toolkit, the first open-source framework designed to help developers employ state-of-the-art model-agnostic calibrated models.
arXiv Detail & Related papers (2022-11-29T15:03:05Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - Self-Compression in Bayesian Neural Networks [0.9176056742068814]
We propose a new insight into network compression through the Bayesian framework.
We show that Bayesian neural networks automatically discover redundancy in model parameters, thus enabling self-compression.
Our experimental results show that the network architecture can be successfully compressed by deleting parameters identified by the network itself.
arXiv Detail & Related papers (2021-11-10T21:19:40Z) - Communication-Efficient Separable Neural Network for Distributed
Inference on Edge Devices [2.28438857884398]
We propose a novel method of exploiting model parallelism to separate a neural network for distributed inferences.
Under proper specifications of devices and configurations of models, our experiments show that the inference of large neural networks on edge clusters can be distributed and accelerated.
arXiv Detail & Related papers (2021-11-03T19:30:28Z) - Compression-aware Projection with Greedy Dimension Reduction for
Convolutional Neural Network Activations [3.6188659868203388]
We propose a compression-aware projection system to improve the trade-off between classification accuracy and compression ratio.
Our test results show that the proposed methods effectively reduce 2.91x5.97x memory access with negligible accuracy drop on MobileNetV2/ResNet18/VGG16.
arXiv Detail & Related papers (2021-10-17T14:02:02Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Towards Efficient Point Cloud Graph Neural Networks Through
Architectural Simplification [8.062534763028808]
We make a step towards improving the efficiency of graph neural network (GNN) models by making the observation that these GNN models are heavily limited by the representational power of their first, feature extracting, layer.
We find that it is possible to radically simplify these models so long as the feature extraction layer is retained with minimal degradation to model performance.
Our approach reduces memory consumption by 20$times$ and latency by up to 9.9$times$ for graph layers in models such as DGCNN; overall, we achieve speed-ups of up to 4.5$times$ and peak memory reductions of
arXiv Detail & Related papers (2021-08-13T17:04:54Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.