Truly Sparse Neural Networks at Scale
- URL: http://arxiv.org/abs/2102.01732v1
- Date: Tue, 2 Feb 2021 20:06:47 GMT
- Title: Truly Sparse Neural Networks at Scale
- Authors: Selima Curci, Decebal Constantin Mocanu, Mykola Pechenizkiyi
- Abstract summary: We train the largest neural network ever trained in terms of representational power -- reaching the bat brain size.
Our approach has state-of-the-art performance while opening the path for an environmentally friendly artificial intelligence era.
- Score: 2.2860412844991655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, sparse training methods have started to be established as a de
facto approach for training and inference efficiency in artificial neural
networks. Yet, this efficiency is just in theory. In practice, everyone uses a
binary mask to simulate sparsity since the typical deep learning software and
hardware are optimized for dense matrix operations. In this paper, we take an
orthogonal approach, and we show that we can train truly sparse neural networks
to harvest their full potential. To achieve this goal, we introduce three novel
contributions, specially designed for sparse neural networks: (1) a parallel
training algorithm and its corresponding sparse implementation from scratch,
(2) an activation function with non-trainable parameters to favour the gradient
flow, and (3) a hidden neurons importance metric to eliminate redundancies. All
in one, we are able to break the record and to train the largest neural network
ever trained in terms of representational power -- reaching the bat brain size.
The results show that our approach has state-of-the-art performance while
opening the path for an environmentally friendly artificial intelligence era.
Related papers
- Simple and Effective Transfer Learning for Neuro-Symbolic Integration [50.592338727912946]
A potential solution to this issue is Neuro-Symbolic Integration (NeSy), where neural approaches are combined with symbolic reasoning.
Most of these methods exploit a neural network to map perceptions to symbols and a logical reasoner to predict the output of the downstream task.
They suffer from several issues, including slow convergence, learning difficulties with complex perception tasks, and convergence to local minima.
This paper proposes a simple yet effective method to ameliorate these problems.
arXiv Detail & Related papers (2024-02-21T15:51:01Z) - Spiking mode-based neural networks [2.5690340428649328]
Spiking neural networks play an important role in brain-like neuromorphic computations and in studying working mechanisms of neural circuits.
One drawback of training a large scale spiking neural network is that updating all weights is quite expensive.
We propose a spiking mode-based training protocol, where the recurrent weight matrix is explained as a Hopfield-like multiplication of three matrices.
arXiv Detail & Related papers (2023-10-23T06:54:17Z) - NeuralFastLAS: Fast Logic-Based Learning from Raw Data [54.938128496934695]
Symbolic rule learners generate interpretable solutions, however they require the input to be encoded symbolically.
Neuro-symbolic approaches overcome this issue by mapping raw data to latent symbolic concepts using a neural network.
We introduce NeuralFastLAS, a scalable and fast end-to-end approach that trains a neural network jointly with a symbolic learner.
arXiv Detail & Related papers (2023-10-08T12:33:42Z) - Taming Binarized Neural Networks and Mixed-Integer Programs [2.7624021966289596]
We show that binarized neural networks admit a tame representation.
This makes it possible to use the framework of Bolte et al. for implicit differentiation.
This approach could also be used for a broader class of mixed-integer programs.
arXiv Detail & Related papers (2023-10-05T21:04:16Z) - Enhanced quantum state preparation via stochastic prediction of neural
network [0.8287206589886881]
In this paper, we explore an intriguing avenue for enhancing algorithm effectiveness through exploiting the knowledge blindness of neural network.
Our approach centers around a machine learning algorithm utilized for preparing arbitrary quantum states in a semiconductor double quantum dot system.
By leveraging prediction generated by the neural network, we are able to guide the optimization process to escape local optima.
arXiv Detail & Related papers (2023-07-27T09:11:53Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - Classifying high-dimensional Gaussian mixtures: Where kernel methods
fail and neural networks succeed [27.38015169185521]
We show theoretically that two-layer neural networks (2LNN) with only a few hidden neurons can beat the performance of kernel learning.
We show how over-parametrising the neural network leads to faster convergence, but does not improve its final performance.
arXiv Detail & Related papers (2021-02-23T15:10:15Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Lossless Compression of Deep Neural Networks [17.753357839478575]
Deep neural networks have been successful in many predictive modeling tasks, such as image and language recognition.
It is challenging to deploy these networks under limited computational resources, such as in mobile devices.
We introduce an algorithm that removes units and layers of a neural network while not changing the output that is produced.
arXiv Detail & Related papers (2020-01-01T15:04:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.