Related papers: Learning Activation Functions for Sparse Neural Networks

Learning Activation Functions for Sparse Neural Networks

URL: http://arxiv.org/abs/2305.10964v2
Date: Mon, 5 Jun 2023 09:52:19 GMT
Title: Learning Activation Functions for Sparse Neural Networks
Authors: Mohammad Loni, Aditya Mohan, Mehdi Asadi, Marius Lindauer
Abstract summary: Sparse Neural Networks (SNNs) can potentially demonstrate similar performance to their dense counterparts. However, the accuracy drop incurred by SNNs, especially at high pruning ratios, can be an issue in critical deployment conditions. We focus on learning a novel way to tune activation functions for sparse networks.
Score: 12.234742322758418
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sparse Neural Networks (SNNs) can potentially demonstrate similar performance to their dense counterparts while saving significant energy and memory at inference. However, the accuracy drop incurred by SNNs, especially at high pruning ratios, can be an issue in critical deployment conditions. While recent works mitigate this issue through sophisticated pruning techniques, we shift our focus to an overlooked factor: hyperparameters and activation functions. Our analyses have shown that the accuracy drop can additionally be attributed to (i) Using ReLU as the default choice for activation functions unanimously, and (ii) Fine-tuning SNNs with the same hyperparameters as dense counterparts. Thus, we focus on learning a novel way to tune activation functions for sparse networks and combining these with a separate hyperparameter optimization (HPO) regime for sparse networks. By conducting experiments on popular DNN models (LeNet-5, VGG-16, ResNet-18, and EfficientNet-B0) trained on MNIST, CIFAR-10, and ImageNet-16 datasets, we show that the novel combination of these two approaches, dubbed Sparse Activation Function Search, short: SAFS, results in up to 15.53%, 8.88%, and 6.33% absolute improvement in the accuracy for LeNet-5, VGG-16, and ResNet-18 over the default training protocols, especially at high pruning ratios. Our code can be found at https://github.com/automl/SAFS

Related papers

Unveiling the Power of Sparse Neural Networks for Feature Selection [60.50319755984697]
Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection. We show that SNNs trained with dynamic sparse training (DST) algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction. Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
arXiv Detail & Related papers (2024-08-08T16:48:33Z)
Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Efficient CNN Architecture Design Guided by Visualization [13.074652653088584]
VGNetG-1.0MP achieves 67.7% top-1 accuracy with 0.99M parameters and 69.2% top-1 accuracy with 1.14M parameters on ImageNet classification dataset. Our VGNetF-1.5MP archives 64.4%(-3.2%) top-1 accuracy and 66.2%(-1.4%) top-1 accuracy with additional Gaussian kernels.
arXiv Detail & Related papers (2022-07-21T06:22:15Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
Elastic-Link for Binarized Neural Network [9.83865304744923]
"Elastic-Link" (EL) module enrich information flow within a BNN by adaptively adding real-valued input features to the subsequent convolutional output features. EL produces a significant improvement on the challenging large-scale ImageNet dataset. With the integration of ReActNet, it yields a new state-of-the-art result of 71.9% top-1 accuracy.
arXiv Detail & Related papers (2021-12-19T13:49:29Z)
MicroNet: Improving Image Recognition with Extremely Low FLOPs [82.54764264255505]
We find two factors, sparse connectivity and dynamic activation function, are effective to improve the accuracy. We present a new dynamic activation function, named Dynamic Shift Max, to improve the non-linearity. We arrive at a family of networks, named MicroNet, that achieves significant performance gains over the state of the art in the low FLOP regime.
arXiv Detail & Related papers (2021-08-12T17:59:41Z)
Adder Neural Networks [75.54239599016535]
We present adder networks (AdderNets) to trade massive multiplications in deep neural networks. In AdderNets, we take the $ell_p$-norm distance between filters and input feature as the output response. We show that the proposed AdderNets can achieve 75.7% Top-1 accuracy 92.3% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
arXiv Detail & Related papers (2021-05-29T04:02:51Z)
Toward Compact Deep Neural Networks via Energy-Aware Pruning [2.578242050187029]
We propose a novel energy-aware pruning method that quantifies the importance of each filter in the network using nuclear-norm (NN) We achieve competitive results with 40.4/49.8% of FLOPs and 45.9/52.9% of parameter reduction with 94.13/94.61% in the Top-1 accuracy with ResNet-56/110 on CIFAR-10.
arXiv Detail & Related papers (2021-03-19T15:33:16Z)
A Framework For Pruning Deep Neural Networks Using Energy-Based Models [45.4796383952516]
A typical deep neural network (DNN) has a large number of trainable parameters. We propose a framework for pruning DNNs based on a population-based global optimization method. Experiments on ResNets, AlexNet, and SqueezeNet show a pruning rate of more than $50%$ of the trainable parameters.
arXiv Detail & Related papers (2021-02-25T21:44:19Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks [9.409651543514615]
This work introduces convolutional layers with pre-defined sparse 2D kernels that have support sets that repeat periodically within and across filters. Due to the efficient storage of our periodic sparse kernels, the parameter savings can translate into considerable improvements in energy efficiency.
arXiv Detail & Related papers (2020-01-29T07:10:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.