Related papers: On the training of sparse and dense deep neural networks: less parameters, same performance

On the training of sparse and dense deep neural networks: less parameters, same performance

URL: http://arxiv.org/abs/2106.09021v1
Date: Thu, 17 Jun 2021 14:54:23 GMT
Title: On the training of sparse and dense deep neural networks: less parameters, same performance
Authors: Lorenzo Chicchi, Lorenzo Giambagli, Lorenzo Buffoni, Timoteo Carletti, Marco Ciavarella, Duccio Fanelli
Abstract summary: We propose a variant of the spectral learning method as appeared in Giambagli et al Nat. Comm. 2021. The eigenvalues act as veritable knobs which can be freely tuned so as to (i) enhance, or alternatively silence, the contribution of the input nodes. Each spectral parameter reflects back on the whole set of inter-nodes weights, an attribute which we shall effectively exploit to yield sparse networks with stunning classification abilities.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks can be trained in reciprocal space, by acting on the eigenvalues and eigenvectors of suitable transfer operators in direct space. Adjusting the eigenvalues, while freezing the eigenvectors, yields a substantial compression of the parameter space. This latter scales by definition with the number of computing neurons. The classification scores, as measured by the displayed accuracy, are however inferior to those attained when the learning is carried in direct space, for an identical architecture and by employing the full set of trainable parameters (with a quadratic dependence on the size of neighbor layers). In this Letter, we propose a variant of the spectral learning method as appeared in Giambagli et al {Nat. Comm.} 2021, which leverages on two sets of eigenvalues, for each mapping between adjacent layers. The eigenvalues act as veritable knobs which can be freely tuned so as to (i) enhance, or alternatively silence, the contribution of the input nodes, (ii) modulate the excitability of the receiving nodes with a mechanism which we interpret as the artificial analogue of the homeostatic plasticity. The number of trainable parameters is still a linear function of the network size, but the performances of the trained device gets much closer to those obtained via conventional algorithms, these latter requiring however a considerably heavier computational cost. The residual gap between conventional and spectral trainings can be eventually filled by employing a suitable decomposition for the non trivial block of the eigenvectors matrix. Each spectral parameter reflects back on the whole set of inter-nodes weights, an attribute which we shall effectively exploit to yield sparse networks with stunning classification abilities, as compared to their homologues trained with conventional means.

Related papers

Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs) Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators. Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z)
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z)
Operator relaxation and the optimal depth of classical shadows [0.0]
We study the sample complexity of learning the expectation value of Pauli operators via shallow shadows'' We show that the shadow norm is expressed in terms of properties of the Heisenberg time evolution of operators under the randomizing circuit.
arXiv Detail & Related papers (2022-12-22T18:46:46Z)
Exploring the role of parameters in variational quantum algorithms [59.20947681019466]
We introduce a quantum-control-inspired method for the characterization of variational quantum circuits using the rank of the dynamical Lie algebra. A promising connection is found between the Lie rank, the accuracy of calculated energies, and the requisite depth to attain target states via a given circuit architecture.
arXiv Detail & Related papers (2022-09-28T20:24:53Z)
A research framework for writing differentiable PDE discretizations in JAX [3.4389358108344257]
Differentiable simulators are an emerging concept with applications in several fields, from reinforcement learning to optimal control. We propose a library of differentiable operators and discretizations, by representing operators as mappings between families of continuous functions, parametrized by finite vectors. We demonstrate the approach on an acoustic optimization problem, where the Helmholtz equation is discretized using Fourier spectral methods, and differentiability is demonstrated using gradient descent to optimize the speed of sound of an acoustic lens.
arXiv Detail & Related papers (2021-11-09T15:58:44Z)
Deep learning with transfer functions: new applications in system identification [0.0]
This paper presents a linear dynamical operator endowed with a well-defined and efficient back-propagation behavior for automatic derivatives computation. The operator enables end-to-end training of structured networks containing linear transfer functions and other differentiable units.
arXiv Detail & Related papers (2021-04-20T08:58:55Z)
Connecting Weighted Automata, Tensor Networks and Recurrent Neural Networks through Spectral Learning [58.14930566993063]
We present connections between three models used in different research fields: weighted finite automata(WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks. We introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous vectors input.
arXiv Detail & Related papers (2020-10-19T15:28:00Z)
Training Invertible Linear Layers through Rank-One Perturbations [0.0]
This work presents a novel approach for training invertible linear layers. In lieu of directly optimizing the network parameters, we train rank-one perturbations and add them to the actual weight matrices infrequently. We show how such invertible blocks improve the mixing and thus normalizing the mode separation of the resulting flows.
arXiv Detail & Related papers (2020-10-14T12:43:47Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Machine learning in spectral domain [4.724825031148412]
tuning the eigenvalues correspond in fact to performing a global training of the neural network. spectral learning bound to the eigenvalues could be also employed for pre-training of deep neural networks.
arXiv Detail & Related papers (2020-05-29T07:55:37Z)
Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations. We then densely reconstruct the feature map with an efficient procedure. The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv Detail & Related papers (2020-03-19T15:36:31Z)
Supervised Learning for Non-Sequential Data: A Canonical Polyadic Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks. To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor. For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.