Feature Space Saturation during Training
- URL: http://arxiv.org/abs/2006.08679v5
- Date: Mon, 22 Nov 2021 14:11:35 GMT
- Title: Feature Space Saturation during Training
- Authors: Mats L. Richter and Justin Shenk and Wolf Byttner and Anders Arpteg
and Mikael Huss
- Abstract summary: We show that a layer's output can be restricted to the eigenspace of its variance matrix without performance loss.
We derive layer saturation - the ratio between the eigenspace dimension and layer width.
We demonstrate how to alter layer saturation in a neural network by changing network depth, filter sizes and input resolution.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose layer saturation - a simple, online-computable method for
analyzing the information processing in neural networks. First, we show that a
layer's output can be restricted to the eigenspace of its variance matrix
without performance loss. We propose a computationally lightweight method for
approximating the variance matrix during training. From the dimension of its
lossless eigenspace we derive layer saturation - the ratio between the
eigenspace dimension and layer width. We show that saturation seems to indicate
which layers contribute to network performance. We demonstrate how to alter
layer saturation in a neural network by changing network depth, filter sizes
and input resolution. Furthermore, we show that well-chosen input resolution
increases network performance by distributing the inference process more evenly
across the network.
Related papers
- LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging [20.774060844559838]
Existing depth compression methods remove redundant non-linear activation functions and merge the consecutive convolution layers into a single layer.
These methods suffer from a critical drawback; the kernel size of the merged layers becomes larger.
We show that this problem can be addressed by jointly pruning convolution layers and activation functions.
We propose LayerMerge, a novel depth compression method that selects which activation layers and convolution layers to remove.
arXiv Detail & Related papers (2024-06-18T17:55:15Z) - Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training.
We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z) - Understanding Deep Neural Networks via Linear Separability of Hidden
Layers [68.23950220548417]
We first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets.
We demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance.
arXiv Detail & Related papers (2023-07-26T05:29:29Z) - WavPool: A New Block for Deep Neural Networks [2.2311710049695446]
We introduce a new, wavelet-transform-based network architecture that we call the multi-resolution perceptron.
By adding a pooling layer, we create a new network block, the WavPool.
WavPool outperforms a similar multilayer perceptron while using fewer parameters, and outperforms a comparable convolutional neural network by 10% on relative accuracy on CIFAR-10.
arXiv Detail & Related papers (2023-06-14T20:35:01Z) - Diffused Redundancy in Pre-trained Representations [98.55546694886819]
We take a closer look at how features are encoded in pre-trained representations.
We find that learned representations in a given layer exhibit a degree of diffuse redundancy.
Our findings shed light on the nature of representations learned by pre-trained deep neural networks.
arXiv Detail & Related papers (2023-05-31T21:00:50Z) - Feature-Learning Networks Are Consistent Across Widths At Realistic
Scales [72.27228085606147]
We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets.
Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training.
We observe, however, that ensembles of narrower networks perform worse than a single wide network.
arXiv Detail & Related papers (2023-05-28T17:09:32Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Total Variation Optimization Layers for Computer Vision [130.10996341231743]
We propose total variation (TV) minimization as a layer for computer vision.
Motivated by the success of total variation in image processing, we hypothesize that TV as a layer provides useful inductive bias for deep-nets.
We study this hypothesis on five computer vision tasks: image classification, weakly supervised object localization, edge-preserving smoothing, edge detection, and image denoising.
arXiv Detail & Related papers (2022-04-07T17:59:27Z) - Mixed-Privacy Forgetting in Deep Networks [114.3840147070712]
We show that the influence of a subset of the training samples can be removed from the weights of a network trained on large-scale image classification tasks.
Inspired by real-world applications of forgetting techniques, we introduce a novel notion of forgetting in mixed-privacy setting.
We show that our method allows forgetting without having to trade off the model accuracy.
arXiv Detail & Related papers (2020-12-24T19:34:56Z) - Mixed-Precision Quantized Neural Network with Progressively Decreasing
Bitwidth For Image Classification and Object Detection [21.48875255723581]
A mixed-precision quantized neural network with progressively ecreasing bitwidth is proposed to improve the trade-off between accuracy and compression.
Experiments on typical network architectures and benchmark datasets demonstrate that the proposed method could achieve better or comparable results.
arXiv Detail & Related papers (2019-12-29T14:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.