Dense for the Price of Sparse: Improved Performance of Sparsely
Initialized Networks via a Subspace Offset
- URL: http://arxiv.org/abs/2102.07655v1
- Date: Fri, 12 Feb 2021 00:05:02 GMT
- Title: Dense for the Price of Sparse: Improved Performance of Sparsely
Initialized Networks via a Subspace Offset
- Authors: Ilan Price, Jared Tanner
- Abstract summary: We introduce a new DCT plus Sparse' layer architecture, which maintains information propagation and trainability even with as little as 0.01% trainable kernel parameters remaining.
Switching from standard sparse layers to DCT plus Sparse layers does not increase the storage footprint of a network and incurs only a small additional computational overhead.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: That neural networks may be pruned to high sparsities and retain high
accuracy is well established. Recent research efforts focus on pruning
immediately after initialization so as to allow the computational savings
afforded by sparsity to extend to the training process. In this work, we
introduce a new `DCT plus Sparse' layer architecture, which maintains
information propagation and trainability even with as little as 0.01% trainable
kernel parameters remaining. We show that standard training of networks built
with these layers, and pruned at initialization, achieves state-of-the-art
accuracy for extreme sparsities on a variety of benchmark network architectures
and datasets. Moreover, these results are achieved using only simple heuristics
to determine the locations of the trainable parameters in the network, and thus
without having to initially store or compute with the full, unpruned network,
as is required by competing prune-at-initialization algorithms. Switching from
standard sparse layers to DCT plus Sparse layers does not increase the storage
footprint of a network and incurs only a small additional computational
overhead.
Related papers
- NIDS Neural Networks Using Sliding Time Window Data Processing with Trainable Activations and its Generalization Capability [0.0]
This paper presents neural networks for network intrusion detection systems (NIDS) that operate on flow data preprocessed with a time window.
It requires only eleven features which do not rely on deep packet inspection and can be found in most NIDS datasets and easily obtained from conventional flow collectors.
The reported training accuracy exceeds 99% for the proposed method with as little as twenty neural network input features.
arXiv Detail & Related papers (2024-10-24T11:36:19Z) - Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning [14.792099973449794]
We propose an algorithm to align the training dynamics of the sparse network with that of the dense one.
We show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account.
Path eXclusion (PX) is able to find lottery tickets even at high sparsity levels.
arXiv Detail & Related papers (2024-06-03T22:19:42Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Sparse tree-based initialization for neural networks [0.0]
We show that dedicated neural network (NN) architectures can handle specific data types such as CNN for images or RNN for text.
In this work, we propose a new technique for (potentially deep) multilayer perceptrons (MLP)
We show that our new initializer operates an implicit regularization during the NN training, and emphasizes that the first layers act as a sparse feature extractor.
arXiv Detail & Related papers (2022-09-30T07:44:03Z) - Neural Architecture Search for Efficient Uncalibrated Deep Photometric
Stereo [105.05232615226602]
We leverage differentiable neural architecture search (NAS) strategy to find uncalibrated PS architecture automatically.
Experiments on the DiLiGenT dataset show that the automatically searched neural architectures performance compares favorably with the state-of-the-art uncalibrated PS methods.
arXiv Detail & Related papers (2021-10-11T21:22:17Z) - Semi-supervised Network Embedding with Differentiable Deep Quantisation [81.49184987430333]
We develop d-SNEQ, a differentiable quantisation method for network embedding.
d-SNEQ incorporates a rank loss to equip the learned quantisation codes with rich high-order information.
It is able to substantially compress the size of trained embeddings, thus reducing storage footprint and accelerating retrieval speed.
arXiv Detail & Related papers (2021-08-20T11:53:05Z) - Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network
Optimization [16.85167651136133]
We take a broader view of training sparse networks and consider the role of regularization, optimization and architecture choices on sparse models.
We show that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime.
arXiv Detail & Related papers (2021-02-02T18:40:26Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.