Related papers: SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

URL: http://arxiv.org/abs/2207.10237v1
Date: Thu, 21 Jul 2022 00:16:05 GMT
Title: SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks
Authors: Chien-Yu Lin, Anish Prabhu, Thomas Merth, Sachin Mehta, Anurag Ranjan, Maxwell Horton, and Mohammad Rastegari
Abstract summary: We present an empirical evaluation on methods for sharing parameters in isotropic networks. We propose a weight sharing strategy to generate a family of models with better overall efficiency.
Score: 25.465917853812538
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent isotropic networks, such as ConvMixer and vision transformers, have found significant success across visual recognition tasks, matching or outperforming non-isotropic convolutional neural networks (CNNs). Isotropic architectures are particularly well-suited to cross-layer weight sharing, an effective neural network compression technique. In this paper, we perform an empirical evaluation on methods for sharing parameters in isotropic networks (SPIN). We present a framework to formalize major weight sharing design decisions and perform a comprehensive empirical evaluation of this design space. Guided by our experimental results, we propose a weight sharing strategy to generate a family of models with better overall efficiency, in terms of FLOPs and parameters versus accuracy, compared to traditional scaling methods alone, for example compressing ConvMixer by 1.9x while improving accuracy on ImageNet. Finally, we perform a qualitative study to further understand the behavior of weight sharing in isotropic architectures. The code is available at https://github.com/apple/ml-spin.

Related papers

Stochastic Weight Sharing for Bayesian Neural Networks [4.5521425500613475]
We use 2D adaptive distributions, Wasserstein distance estimations, and alpha blending to encode the behaviour of a BNN in a lower dimensional, soft Gaussian representation.<n>Our approach compresses model parameters by approximately 50x and reduces model size by 75, while achieving accuracy and uncertainty estimations comparable to the state-of-theart.
arXiv Detail & Related papers (2025-05-23T13:07:18Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training. We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z)
INSightR-Net: Interpretable Neural Network for Regression using Similarity-based Comparisons to Prototypical Examples [2.4366811507669124]
Convolutional neural networks (CNNs) have shown exceptional performance for a range of medical imaging tasks. In this work, we propose an inherently interpretable CNN for regression using similarity-based comparisons. A prototype layer incorporated into the architecture enables visualization of the areas in the image that are most similar to learned prototypes. The final prediction is then intuitively modeled as a mean of prototype labels, weighted by the similarities.
arXiv Detail & Related papers (2022-07-31T15:56:15Z)
Improving Parametric Neural Networks for High-Energy Physics (and Beyond) [0.0]
We aim at deepening the understanding of Parametric Neural Network (pNN) networks in light of real-world usage. We propose an alternative parametrization scheme, resulting in a new parametrized neural network architecture: the AffinePNN. We extensively evaluate our models on the HEPMASS dataset, along its imbalanced version (called HEPMASS-IMB)
arXiv Detail & Related papers (2022-02-01T14:18:43Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Ensembles of Spiking Neural Networks [0.3007949058551534]
This paper demonstrates how to construct ensembles of spiking neural networks producing state-of-the-art results. We achieve classification accuracies of 98.71%, 100.0%, and 99.09%, on the MNIST, NMNIST and DVS Gesture datasets respectively. We formalize spiking neural networks as GLM predictors, identifying a suitable representation for their target domain.
arXiv Detail & Related papers (2020-10-15T17:45:18Z)
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective. We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.