Related papers: Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset

Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset

URL: http://arxiv.org/abs/2102.07655v1
Date: Fri, 12 Feb 2021 00:05:02 GMT
Title: Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset
Authors: Ilan Price, Jared Tanner
Abstract summary: We introduce a new DCT plus Sparse' layer architecture, which maintains information propagation and trainability even with as little as 0.01% trainable kernel parameters remaining. Switching from standard sparse layers to DCT plus Sparse layers does not increase the storage footprint of a network and incurs only a small additional computational overhead.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: That neural networks may be pruned to high sparsities and retain high accuracy is well established. Recent research efforts focus on pruning immediately after initialization so as to allow the computational savings afforded by sparsity to extend to the training process. In this work, we introduce a new `DCT plus Sparse' layer architecture, which maintains information propagation and trainability even with as little as 0.01% trainable kernel parameters remaining. We show that standard training of networks built with these layers, and pruned at initialization, achieves state-of-the-art accuracy for extreme sparsities on a variety of benchmark network architectures and datasets. Moreover, these results are achieved using only simple heuristics to determine the locations of the trainable parameters in the network, and thus without having to initially store or compute with the full, unpruned network, as is required by competing prune-at-initialization algorithms. Switching from standard sparse layers to DCT plus Sparse layers does not increase the storage footprint of a network and incurs only a small additional computational overhead.

Related papers

YOSO: You-Only-Sample-Once via Compressed Sensing for Graph Neural Network Training [9.02251811867533]
YOSO (You-Only-Sample-Once) is an algorithm designed to achieve efficient training while preserving prediction accuracy. YOSO not only avoids costly computations in traditional compressed sensing (CS) methods, such as orthonormal basis calculations, but also ensures high-probability accuracy retention.
arXiv Detail & Related papers (2024-11-08T16:47:51Z)
NIDS Neural Networks Using Sliding Time Window Data Processing with Trainable Activations and its Generalization Capability [0.0]
This paper presents neural networks for network intrusion detection systems (NIDS) that operate on flow data preprocessed with a time window. It requires only eleven features which do not rely on deep packet inspection and can be found in most NIDS datasets and easily obtained from conventional flow collectors. The reported training accuracy exceeds 99% for the proposed method with as little as twenty neural network input features.
arXiv Detail & Related papers (2024-10-24T11:36:19Z)
Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning [14.792099973449794]
We propose an algorithm to align the training dynamics of the sparse network with that of the dense one. We show how the usually neglected data-dependent component in the NTK's spectrum can be taken into account. Path eXclusion (PX) is able to find lottery tickets even at high sparsity levels.
arXiv Detail & Related papers (2024-06-03T22:19:42Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter. We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures'' Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z)
Sparse tree-based initialization for neural networks [0.0]
We show that dedicated neural network (NN) architectures can handle specific data types such as CNN for images or RNN for text. In this work, we propose a new technique for (potentially deep) multilayer perceptrons (MLP) We show that our new initializer operates an implicit regularization during the NN training, and emphasizes that the first layers act as a sparse feature extractor.
arXiv Detail & Related papers (2022-09-30T07:44:03Z)
Neural Architecture Search for Efficient Uncalibrated Deep Photometric Stereo [105.05232615226602]
We leverage differentiable neural architecture search (NAS) strategy to find uncalibrated PS architecture automatically. Experiments on the DiLiGenT dataset show that the automatically searched neural architectures performance compares favorably with the state-of-the-art uncalibrated PS methods.
arXiv Detail & Related papers (2021-10-11T21:22:17Z)
Semi-supervised Network Embedding with Differentiable Deep Quantisation [81.49184987430333]
We develop d-SNEQ, a differentiable quantisation method for network embedding. d-SNEQ incorporates a rank loss to equip the learned quantisation codes with rich high-order information. It is able to substantially compress the size of trained embeddings, thus reducing storage footprint and accelerating retrieval speed.
arXiv Detail & Related papers (2021-08-20T11:53:05Z)
Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning. We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset. We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.