On the training of sparse and dense deep neural networks: less
parameters, same performance
- URL: http://arxiv.org/abs/2106.09021v1
- Date: Thu, 17 Jun 2021 14:54:23 GMT
- Title: On the training of sparse and dense deep neural networks: less
parameters, same performance
- Authors: Lorenzo Chicchi, Lorenzo Giambagli, Lorenzo Buffoni, Timoteo Carletti,
Marco Ciavarella, Duccio Fanelli
- Abstract summary: We propose a variant of the spectral learning method as appeared in Giambagli et al Nat. Comm. 2021.
The eigenvalues act as veritable knobs which can be freely tuned so as to (i) enhance, or alternatively silence, the contribution of the input nodes.
Each spectral parameter reflects back on the whole set of inter-nodes weights, an attribute which we shall effectively exploit to yield sparse networks with stunning classification abilities.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks can be trained in reciprocal space, by acting on the
eigenvalues and eigenvectors of suitable transfer operators in direct space.
Adjusting the eigenvalues, while freezing the eigenvectors, yields a
substantial compression of the parameter space. This latter scales by
definition with the number of computing neurons. The classification scores, as
measured by the displayed accuracy, are however inferior to those attained when
the learning is carried in direct space, for an identical architecture and by
employing the full set of trainable parameters (with a quadratic dependence on
the size of neighbor layers). In this Letter, we propose a variant of the
spectral learning method as appeared in Giambagli et al {Nat. Comm.} 2021,
which leverages on two sets of eigenvalues, for each mapping between adjacent
layers. The eigenvalues act as veritable knobs which can be freely tuned so as
to (i) enhance, or alternatively silence, the contribution of the input nodes,
(ii) modulate the excitability of the receiving nodes with a mechanism which we
interpret as the artificial analogue of the homeostatic plasticity. The number
of trainable parameters is still a linear function of the network size, but the
performances of the trained device gets much closer to those obtained via
conventional algorithms, these latter requiring however a considerably heavier
computational cost. The residual gap between conventional and spectral
trainings can be eventually filled by employing a suitable decomposition for
the non trivial block of the eigenvectors matrix. Each spectral parameter
reflects back on the whole set of inter-nodes weights, an attribute which we
shall effectively exploit to yield sparse networks with stunning classification
abilities, as compared to their homologues trained with conventional means.
Related papers
- Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Operator relaxation and the optimal depth of classical shadows [0.0]
We study the sample complexity of learning the expectation value of Pauli operators via shallow shadows''
We show that the shadow norm is expressed in terms of properties of the Heisenberg time evolution of operators under the randomizing circuit.
arXiv Detail & Related papers (2022-12-22T18:46:46Z) - Exploring the role of parameters in variational quantum algorithms [59.20947681019466]
We introduce a quantum-control-inspired method for the characterization of variational quantum circuits using the rank of the dynamical Lie algebra.
A promising connection is found between the Lie rank, the accuracy of calculated energies, and the requisite depth to attain target states via a given circuit architecture.
arXiv Detail & Related papers (2022-09-28T20:24:53Z) - A research framework for writing differentiable PDE discretizations in
JAX [3.4389358108344257]
Differentiable simulators are an emerging concept with applications in several fields, from reinforcement learning to optimal control.
We propose a library of differentiable operators and discretizations, by representing operators as mappings between families of continuous functions, parametrized by finite vectors.
We demonstrate the approach on an acoustic optimization problem, where the Helmholtz equation is discretized using Fourier spectral methods, and differentiability is demonstrated using gradient descent to optimize the speed of sound of an acoustic lens.
arXiv Detail & Related papers (2021-11-09T15:58:44Z) - Connecting Weighted Automata, Tensor Networks and Recurrent Neural
Networks through Spectral Learning [58.14930566993063]
We present connections between three models used in different research fields: weighted finite automata(WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks.
We introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous vectors input.
arXiv Detail & Related papers (2020-10-19T15:28:00Z) - Training Invertible Linear Layers through Rank-One Perturbations [0.0]
This work presents a novel approach for training invertible linear layers.
In lieu of directly optimizing the network parameters, we train rank-one perturbations and add them to the actual weight matrices infrequently.
We show how such invertible blocks improve the mixing and thus normalizing the mode separation of the resulting flows.
arXiv Detail & Related papers (2020-10-14T12:43:47Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Machine learning in spectral domain [4.724825031148412]
tuning the eigenvalues correspond in fact to performing a global training of the neural network.
spectral learning bound to the eigenvalues could be also employed for pre-training of deep neural networks.
arXiv Detail & Related papers (2020-05-29T07:55:37Z) - Spatially Adaptive Inference with Stochastic Feature Sampling and
Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations.
We then densely reconstruct the feature map with an efficient procedure.
The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv Detail & Related papers (2020-03-19T15:36:31Z) - Supervised Learning for Non-Sequential Data: A Canonical Polyadic
Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks.
To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor.
For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.