Experiments on Properties of Hidden Structures of Sparse Neural Networks
- URL: http://arxiv.org/abs/2107.12917v1
- Date: Tue, 27 Jul 2021 16:18:13 GMT
- Title: Experiments on Properties of Hidden Structures of Sparse Neural Networks
- Authors: Julian Stier, Harshil Darji, Michael Granitzer
- Abstract summary: We show how sparsity can be achieved through priors, pruning, and during learning, and answer questions on the relationship between the structure of Neural Networks and their performance.
Within our experiments, we show how magnitude class blinded pruning achieves 97.5% on MNIST with 80% compression and re-training, which is 0.5 points more than without compression.
Further, performance prediction for Recurrent Networks learning the Reber grammar shows an $R2$ of up to 0.81 given only structural information.
- Score: 1.227734309612871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sparsity in the structure of Neural Networks can lead to less energy
consumption, less memory usage, faster computation times on convenient
hardware, and automated machine learning. If sparsity gives rise to certain
kinds of structure, it can explain automatically obtained features during
learning.
We provide insights into experiments in which we show how sparsity can be
achieved through prior initialization, pruning, and during learning, and answer
questions on the relationship between the structure of Neural Networks and
their performance. This includes the first work of inducing priors from network
theory into Recurrent Neural Networks and an architectural performance
prediction during a Neural Architecture Search. Within our experiments, we show
how magnitude class blinded pruning achieves 97.5% on MNIST with 80%
compression and re-training, which is 0.5 points more than without compression,
that magnitude class uniform pruning is significantly inferior to it and how a
genetic search enhanced with performance prediction achieves 82.4% on CIFAR10.
Further, performance prediction for Recurrent Networks learning the Reber
grammar shows an $R^2$ of up to 0.81 given only structural information.
Related papers
- ONNX-Net: Towards Universal Representations and Instant Performance Prediction for Neural Architectures [60.14199724905456]
ONNX-Bench is a benchmark consisting of a collection of neural networks in a unified format based on ONNX files.<n> ONNX-Net represents any neural architecture using natural language descriptions acting as an input to a performance predictor.<n>Experiments show strong zero-shot performance across disparate search spaces using only a small amount of pretraining samples.
arXiv Detail & Related papers (2025-10-06T15:43:36Z) - Auto-Compressing Networks [51.221103189527014]
We introduce Auto-compression Networks (ACNs), an architectural variant where long feedforward connections from each layer replace traditional short residual connections.<n>We show that ACNs exhibit enhanced noise compared to residual networks, superior performance in low-data settings, and mitigate catastrophic forgetting.<n>These findings establish ACNs as a practical approach to developing efficient neural architectures.
arXiv Detail & Related papers (2025-06-11T13:26:09Z) - Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.
We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.
This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z) - Neural Network Pruning by Gradient Descent [7.427858344638741]
We introduce a novel and straightforward neural network pruning framework that incorporates the Gumbel-Softmax technique.
We demonstrate its exceptional compression capability, maintaining high accuracy on the MNIST dataset with only 0.15% of the original network parameters.
We believe our method opens a promising new avenue for deep learning pruning and the creation of interpretable machine learning systems.
arXiv Detail & Related papers (2023-11-21T11:12:03Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Dynamical systems' based neural networks [0.7874708385247353]
We build neural networks using a suitable, structure-preserving, numerical time-discretisation.
The structure of the neural network is then inferred from the properties of the ODE vector field.
We present two universal approximation results and demonstrate how to impose some particular properties on the neural networks.
arXiv Detail & Related papers (2022-10-05T16:30:35Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - An Experimental Study of the Impact of Pre-training on the Pruning of a
Convolutional Neural Network [0.0]
In recent years, deep neural networks have known a wide success in various application domains.
Deep neural networks usually involve a large number of parameters, which correspond to the weights of the network.
The pruning methods notably attempt to reduce the size of the parameter set, by identifying and removing the irrelevant weights.
arXiv Detail & Related papers (2021-12-15T16:02:15Z) - Efficient Neural Architecture Search with Performance Prediction [0.0]
We use a neural architecture search to find the best network architecture for the task at hand.
Existing NAS algorithms generally evaluate the fitness of a new architecture by fully training from scratch.
An end-to-end offline performance predictor is proposed to accelerate the evaluation of sampled architectures.
arXiv Detail & Related papers (2021-08-04T05:44:16Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z) - Memory capacity of neural networks with threshold and ReLU activations [2.5889737226898437]
mildly overparametrized neural networks are often able to memorize the training data with $100%$ accuracy.
We prove that this phenomenon holds for general multilayered perceptrons.
arXiv Detail & Related papers (2020-01-20T01:54:21Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.