Till the Layers Collapse: Compressing a Deep Neural Network through the Lenses of Batch Normalization Layers
- URL: http://arxiv.org/abs/2412.15077v1
- Date: Thu, 19 Dec 2024 17:26:07 GMT
- Title: Till the Layers Collapse: Compressing a Deep Neural Network through the Lenses of Batch Normalization Layers
- Authors: Zhu Liao, Nour Hezbri, Victor Quétu, Van-Tam Nguyen, Enzo Tartaglione,
- Abstract summary: We introduce a method called textbfTill the textbfLayers textbfCollapse (TLC), which compresses deep neural networks through the lenses of batch normalization layers.
We validate our method on popular models such as Swin-T, MobileNet-V2, and RoBERTa, across both image classification and natural language processing (NLP) tasks.
- Score: 5.008189006630566
- License:
- Abstract: Today, deep neural networks are widely used since they can handle a variety of complex tasks. Their generality makes them very powerful tools in modern technology. However, deep neural networks are often overparameterized. The usage of these large models consumes a lot of computation resources. In this paper, we introduce a method called \textbf{T}ill the \textbf{L}ayers \textbf{C}ollapse (TLC), which compresses deep neural networks through the lenses of batch normalization layers. By reducing the depth of these networks, our method decreases deep neural networks' computational requirements and overall latency. We validate our method on popular models such as Swin-T, MobileNet-V2, and RoBERTa, across both image classification and natural language processing (NLP) tasks.
Related papers
- NEPENTHE: Entropy-Based Pruning as a Neural Network Depth's Reducer [5.373015313199385]
We propose an eNtropy-basEd Pruning as a nEural Network depTH's rEducer to alleviate deep neural networks' computational burden.
We validate our approach on popular architectures such as MobileNet and Swin-T.
arXiv Detail & Related papers (2024-04-24T09:12:04Z) - Fully Spiking Actor Network with Intra-layer Connections for
Reinforcement Learning [51.386945803485084]
We focus on the task where the agent needs to learn multi-dimensional deterministic policies to control.
Most existing spike-based RL methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully-connected layer.
To develop a fully spiking actor network without any floating-point matrix operations, we draw inspiration from the non-spiking interneurons found in insects.
arXiv Detail & Related papers (2024-01-09T07:31:34Z) - Tensor Decomposition for Model Reduction in Neural Networks: A Review [13.96938227911258]
Modern neural networks have revolutionized the fields of computer vision (CV) and Natural Language Processing (NLP)
They are widely used for solving complex CV tasks and NLP tasks such as image classification, image generation, and machine translation.
This paper reviews six tensor decomposition methods and illustrates their ability to compress model parameters.
arXiv Detail & Related papers (2023-04-26T13:12:00Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Deep Learning without Shortcuts: Shaping the Kernel with Tailored
Rectifiers [83.74380713308605]
We develop a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs.
We show in experiments that our method, which introduces negligible extra computational cost, validation accuracies with deep vanilla networks that are competitive with ResNets.
arXiv Detail & Related papers (2022-03-15T17:49:08Z) - Layer Folding: Neural Network Depth Reduction using Activation
Linearization [0.0]
Modern devices exhibit a high level of parallelism, but real-time latency is still highly dependent on networks' depth.
We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one.
We apply our method to networks pre-trained on CIFAR-10 and CIFAR-100 and find that they can all be transformed into shallower forms that share a similar depth.
arXiv Detail & Related papers (2021-06-17T08:22:46Z) - Overcoming Catastrophic Forgetting in Graph Neural Networks [50.900153089330175]
Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks.
We propose a novel scheme dedicated to overcoming this problem and hence strengthen continual learning in graph neural networks (GNNs)
At the heart of our approach is a generic module, termed as topology-aware weight preserving(TWP)
arXiv Detail & Related papers (2020-12-10T22:30:25Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - An Overview of Neural Network Compression [2.550900579709111]
In recent years there has been a resurgence in model compression techniques, particularly for deep convolutional neural networks and self-attention based networks such as the Transformer.
This paper provides a timely overview of both old and current compression techniques for deep neural networks, including pruning, quantization, tensor decomposition, knowledge distillation and combinations thereof.
arXiv Detail & Related papers (2020-06-05T20:28:56Z) - Lossless Compression of Deep Neural Networks [17.753357839478575]
Deep neural networks have been successful in many predictive modeling tasks, such as image and language recognition.
It is challenging to deploy these networks under limited computational resources, such as in mobile devices.
We introduce an algorithm that removes units and layers of a neural network while not changing the output that is produced.
arXiv Detail & Related papers (2020-01-01T15:04:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.