Understanding Deep Neural Networks via Linear Separability of Hidden
Layers
- URL: http://arxiv.org/abs/2307.13962v1
- Date: Wed, 26 Jul 2023 05:29:29 GMT
- Title: Understanding Deep Neural Networks via Linear Separability of Hidden
Layers
- Authors: Chao Zhang, Xinyu Chen, Wensheng Li, Lixue Liu, Wei Wu, Dacheng Tao
- Abstract summary: We first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets.
We demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance.
- Score: 68.23950220548417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we measure the linear separability of hidden layer outputs to
study the characteristics of deep neural networks. In particular, we first
propose Minkowski difference based linear separability measures (MD-LSMs) to
evaluate the linear separability degree of two points sets. Then, we
demonstrate that there is a synchronicity between the linear separability
degree of hidden layer outputs and the network training performance, i.e., if
the updated weights can enhance the linear separability degree of hidden layer
outputs, the updated network will achieve a better training performance, and
vice versa. Moreover, we study the effect of activation function and network
size (including width and depth) on the linear separability of hidden layers.
Finally, we conduct the numerical experiments to validate our findings on some
popular deep networks including multilayer perceptron (MLP), convolutional
neural network (CNN), deep belief network (DBN), ResNet, VGGNet, AlexNet,
vision transformer (ViT) and GoogLeNet.
Related papers
- Representing Neural Network Layers as Linear Operations via Koopman Operator Theory [9.558002301188091]
We show that a linear view of neural networks makes understanding and controlling networks much more approachable.
We replace layers of an trained dataset with predictions from a DMD model, achieving a mdoel accuracy of up to 97.3%.
In addition, we replace layers in an trained on the MNIST dataset, achieving up to 95.8%, compared to the original 97.2% on the test set.
arXiv Detail & Related papers (2024-09-02T15:04:33Z) - Understanding Deep Representation Learning via Layerwise Feature
Compression and Discrimination [33.273226655730326]
We show that each layer of a deep linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate.
This is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
arXiv Detail & Related papers (2023-11-06T09:00:38Z) - The Evolution of the Interplay Between Input Distributions and Linear
Regions in Networks [20.97553518108504]
We count the number of linear convex regions in deep neural networks based on ReLU.
In particular, we prove that for any one-dimensional input, there exists a minimum threshold for the number of neurons required to express it.
We also unveil the iterative refinement process of decision boundaries in ReLU networks during training.
arXiv Detail & Related papers (2023-10-28T15:04:53Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature
Connectivity [62.11981948274508]
The study of LLFC transcends and advances our understanding of LMC by adopting a feature-learning perspective.
We provide comprehensive empirical evidence for LLFC across a wide range of settings, demonstrating that whenever two trained networks satisfy LMC, they also satisfy LLFC in nearly all the layers.
arXiv Detail & Related papers (2023-07-17T07:16:28Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Learning distinct features helps, provably [98.78384185493624]
We study the diversity of the features learned by a two-layer neural network trained with the least squares loss.
We measure the diversity by the average $L$-distance between the hidden-layer features.
arXiv Detail & Related papers (2021-06-10T19:14:45Z) - Statistical Mechanics of Deep Linear Neural Networks: The
Back-Propagating Renormalization Group [4.56877715768796]
We study the statistical mechanics of learning in Deep Linear Neural Networks (DLNNs) in which the input-output function of an individual unit is linear.
We solve exactly the network properties following supervised learning using an equilibrium Gibbs distribution in the weight space.
Our numerical simulations reveal that despite the nonlinearity, the predictions of our theory are largely shared by ReLU networks with modest depth.
arXiv Detail & Related papers (2020-12-07T20:08:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.