SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic
Networks
- URL: http://arxiv.org/abs/2207.10237v1
- Date: Thu, 21 Jul 2022 00:16:05 GMT
- Title: SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic
Networks
- Authors: Chien-Yu Lin, Anish Prabhu, Thomas Merth, Sachin Mehta, Anurag Ranjan,
Maxwell Horton, and Mohammad Rastegari
- Abstract summary: We present an empirical evaluation on methods for sharing parameters in isotropic networks.
We propose a weight sharing strategy to generate a family of models with better overall efficiency.
- Score: 25.465917853812538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent isotropic networks, such as ConvMixer and vision transformers, have
found significant success across visual recognition tasks, matching or
outperforming non-isotropic convolutional neural networks (CNNs). Isotropic
architectures are particularly well-suited to cross-layer weight sharing, an
effective neural network compression technique. In this paper, we perform an
empirical evaluation on methods for sharing parameters in isotropic networks
(SPIN). We present a framework to formalize major weight sharing design
decisions and perform a comprehensive empirical evaluation of this design
space. Guided by our experimental results, we propose a weight sharing strategy
to generate a family of models with better overall efficiency, in terms of
FLOPs and parameters versus accuracy, compared to traditional scaling methods
alone, for example compressing ConvMixer by 1.9x while improving accuracy on
ImageNet. Finally, we perform a qualitative study to further understand the
behavior of weight sharing in isotropic architectures. The code is available at
https://github.com/apple/ml-spin.
Related papers
- Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z) - INSightR-Net: Interpretable Neural Network for Regression using
Similarity-based Comparisons to Prototypical Examples [2.4366811507669124]
Convolutional neural networks (CNNs) have shown exceptional performance for a range of medical imaging tasks.
In this work, we propose an inherently interpretable CNN for regression using similarity-based comparisons.
A prototype layer incorporated into the architecture enables visualization of the areas in the image that are most similar to learned prototypes.
The final prediction is then intuitively modeled as a mean of prototype labels, weighted by the similarities.
arXiv Detail & Related papers (2022-07-31T15:56:15Z) - Improving Parametric Neural Networks for High-Energy Physics (and
Beyond) [0.0]
We aim at deepening the understanding of Parametric Neural Network (pNN) networks in light of real-world usage.
We propose an alternative parametrization scheme, resulting in a new parametrized neural network architecture: the AffinePNN.
We extensively evaluate our models on the HEPMASS dataset, along its imbalanced version (called HEPMASS-IMB)
arXiv Detail & Related papers (2022-02-01T14:18:43Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Ensembles of Spiking Neural Networks [0.3007949058551534]
This paper demonstrates how to construct ensembles of spiking neural networks producing state-of-the-art results.
We achieve classification accuracies of 98.71%, 100.0%, and 99.09%, on the MNIST, NMNIST and DVS Gesture datasets respectively.
We formalize spiking neural networks as GLM predictors, identifying a suitable representation for their target domain.
arXiv Detail & Related papers (2020-10-15T17:45:18Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.