Model Stitching: Looking For Functional Similarity Between
Representations
- URL: http://arxiv.org/abs/2303.11277v2
- Date: Thu, 31 Aug 2023 22:56:43 GMT
- Title: Model Stitching: Looking For Functional Similarity Between
Representations
- Authors: Adriano Hernandez, Rumen Dangovski, Peter Y. Lu, Marin Soljacic
- Abstract summary: We expand on a previous work which used model stitching to compare representations of the same shapes learned by differently seeded and/or trained neural networks of the same architecture.
We reveal unexpected behavior of model stitching. Namely, we find that stitching, based on convolutions, for small ResNets, can reach high accuracy if those layers come later in the first (sender) network than in the second (receiver)
- Score: 5.657258033928475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model stitching (Lenc & Vedaldi 2015) is a compelling methodology to compare
different neural network representations, because it allows us to measure to
what degree they may be interchanged. We expand on a previous work from Bansal,
Nakkiran & Barak which used model stitching to compare representations of the
same shapes learned by differently seeded and/or trained neural networks of the
same architecture. Our contribution enables us to compare the representations
learned by layers with different shapes from neural networks with different
architectures. We subsequently reveal unexpected behavior of model stitching.
Namely, we find that stitching, based on convolutions, for small ResNets, can
reach high accuracy if those layers come later in the first (sender) network
than in the second (receiver), even if those layers are far apart.
Related papers
- Neural Metamorphosis [72.88137795439407]
This paper introduces a new learning paradigm termed Neural Metamorphosis (NeuMeta), which aims to build self-morphable neural networks.
NeuMeta directly learns the continuous weight manifold of neural networks.
It sustains full-size performance even at a 75% compression rate.
arXiv Detail & Related papers (2024-10-10T14:49:58Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Characterization of topological structures in different neural network architectures [0.0]
We develop methods for analyzing representations from different architectures and check how one should use them to obtain valid results.
We applied these methods for ResNet, VGG19, and ViT architectures and found substantial differences along with some similarities.
arXiv Detail & Related papers (2024-07-08T18:02:18Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Multilayer Multiset Neuronal Networks -- MMNNs [55.2480439325792]
The present work describes multilayer multiset neuronal networks incorporating two or more layers of coincidence similarity neurons.
The work also explores the utilization of counter-prototype points, which are assigned to the image regions to be avoided.
arXiv Detail & Related papers (2023-08-28T12:55:13Z) - Neural Representations Reveal Distinct Modes of Class Fitting in
Residual Convolutional Networks [5.1271832547387115]
We leverage probabilistic models of neural representations to investigate how residual networks fit classes.
We find that classes in the investigated models are not fitted in an uniform way.
We show that the uncovered structure in neural representations correlate with robustness of training examples and adversarial memorization.
arXiv Detail & Related papers (2022-12-01T18:55:58Z) - How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets.
In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset.
We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z) - Similarity and Matching of Neural Network Representations [0.0]
We employ a toolset -- dubbed Dr. Frankenstein -- to analyse the similarity of representations in deep neural networks.
We aim to match the activations on given layers of two trained neural networks by joining them with a stitching layer.
arXiv Detail & Related papers (2021-10-27T17:59:46Z) - Comparing Deep Neural Nets with UMAP Tour [12.910602784766562]
UMAP Tour is built to visually inspect and compare internal behavior of real-world neural network models.
We find concepts learned in state-of-the-art models and dissimilarities between them, such as GoogLeNet and ResNet.
arXiv Detail & Related papers (2021-10-18T15:59:13Z) - Revisiting Model Stitching to Compare Neural Representations [8.331711958610347]
We consider a "stitched model" formed by connecting the bottom-layers of $A$ to the top-layers of $B$, with a simple trainable layer between them.
We show that good networks of the same architecture, but trained in very different ways, can be stitched to each other without drop in performance.
We also give evidence for the intuition that "more is better" by showing that representations learnt with (1) more data, (2) bigger width, or (3) more training time can be "plugged in'' to weaker models to improve performance.
arXiv Detail & Related papers (2021-06-14T18:05:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.