Revealing the Utilized Rank of Subspaces of Learning in Neural Networks
- URL: http://arxiv.org/abs/2407.04797v1
- Date: Fri, 5 Jul 2024 18:14:39 GMT
- Title: Revealing the Utilized Rank of Subspaces of Learning in Neural Networks
- Authors: Isha Garg, Christian Koguchi, Eshan Verma, Daniel Ulbricht,
- Abstract summary: We study how well the learned weights of a neural network utilize the space available to them.
Most learned weights appear to be full rank, and are therefore not amenable to low rank decomposition.
We propose a simple data-driven transformation that projects the weights onto the subspace where the data and the weight interact.
- Score: 3.4133351364625275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we study how well the learned weights of a neural network utilize the space available to them. This notion is related to capacity, but additionally incorporates the interaction of the network architecture with the dataset. Most learned weights appear to be full rank, and are therefore not amenable to low rank decomposition. This deceptively implies that the weights are utilizing the entire space available to them. We propose a simple data-driven transformation that projects the weights onto the subspace where the data and the weight interact. This preserves the functional mapping of the layer and reveals its low rank structure. In our findings, we conclude that most models utilize a fraction of the available space. For instance, for ViTB-16 and ViTL-16 trained on ImageNet, the mean layer utilization is 35% and 20% respectively. Our transformation results in reducing the parameters to 50% and 25% respectively, while resulting in less than 0.2% accuracy drop after fine-tuning. We also show that self-supervised pre-training drives this utilization up to 70%, justifying its suitability for downstream tasks.
Related papers
- Neural Metamorphosis [72.88137795439407]
This paper introduces a new learning paradigm termed Neural Metamorphosis (NeuMeta), which aims to build self-morphable neural networks.
NeuMeta directly learns the continuous weight manifold of neural networks.
It sustains full-size performance even at a 75% compression rate.
arXiv Detail & Related papers (2024-10-10T14:49:58Z) - Weights Augmentation: it has never ever ever ever let her model down [1.5020330976600735]
This article proposes the concept of weight augmentation, focusing on weight exploration.
Weight Augmentation Strategy (WAS) is to adopt random transformed weight coefficients training and transformed, named Shadow Weight(SW), for networks that can be used to calculate loss function.
Our experimental results show that convolutional neural networks, such as VGG-16, ResNet-18, ResNet-34, GoogleNet, MobilementV2, and Efficientment-Lite, can benefit much at little or no cost.
arXiv Detail & Related papers (2024-05-30T00:57:06Z) - Improved Generalization of Weight Space Networks via Augmentations [53.87011906358727]
Learning in deep weight spaces (DWS) is an emerging research direction, with applications to 2D and 3D neural fields (INRs, NeRFs)
We empirically analyze the reasons for this overfitting and find that a key reason is the lack of diversity in DWS datasets.
To address this, we explore strategies for data augmentation in weight spaces and propose a MixUp method adapted for weight spaces.
arXiv Detail & Related papers (2024-02-06T15:34:44Z) - Learning to Compose SuperWeights for Neural Parameter Allocation Search [61.078949532440724]
We show that our approach can generate parameters for many network using the same set of weights.
This enables us to support tasks like efficient ensembling and anytime prediction.
arXiv Detail & Related papers (2023-12-03T04:20:02Z) - Data Augmentations in Deep Weight Spaces [89.45272760013928]
We introduce a novel augmentation scheme based on the Mixup method.
We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate.
arXiv Detail & Related papers (2023-11-15T10:43:13Z) - Diffused Redundancy in Pre-trained Representations [98.55546694886819]
We take a closer look at how features are encoded in pre-trained representations.
We find that learned representations in a given layer exhibit a degree of diffuse redundancy.
Our findings shed light on the nature of representations learned by pre-trained deep neural networks.
arXiv Detail & Related papers (2023-05-31T21:00:50Z) - Gradient-based Weight Density Balancing for Robust Dynamic Sparse
Training [59.48691524227352]
Training a sparse neural network from scratch requires optimizing connections at the same time as the connections themselves.
While the connections per layer are optimized multiple times during training, the density of each layer typically remains constant.
We propose Global Gradient-based Redistribution, a technique which distributes weights across all layers - adding more weights to the layers that need them most.
arXiv Detail & Related papers (2022-10-25T13:32:09Z) - Knowledge Evolution in Neural Networks [39.52537143009937]
We propose an evolution-inspired training approach to boost performance on relatively small datasets.
We iteratively evolve the knowledge inside the fit-hypothesis by perturbing the reset-hypothesis for multiple generations.
This approach not only boosts performance, but also learns a slim network with a smaller inference cost.
arXiv Detail & Related papers (2021-03-09T00:25:34Z) - Compression-aware Continual Learning using Singular Value Decomposition [2.4283778735260686]
We propose a compression based continual task learning method that can dynamically grow a neural network.
Inspired by the recent model compression techniques, we employ compression-aware training and perform low-rank weight approximations.
Our method achieves compressed representations with minimal performance degradation without the need for costly fine-tuning.
arXiv Detail & Related papers (2020-09-03T23:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.