Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature
Connectivity
- URL: http://arxiv.org/abs/2307.08286v2
- Date: Mon, 13 Nov 2023 08:25:48 GMT
- Title: Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature
Connectivity
- Authors: Zhanpeng Zhou, Yongyi Yang, Xiaojiang Yang, Junchi Yan, Wei Hu
- Abstract summary: The study of LLFC transcends and advances our understanding of LMC by adopting a feature-learning perspective.
We provide comprehensive empirical evidence for LLFC across a wide range of settings, demonstrating that whenever two trained networks satisfy LMC, they also satisfy LLFC in nearly all the layers.
- Score: 62.11981948274508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has revealed many intriguing empirical phenomena in neural
network training, despite the poorly understood and highly complex loss
landscapes and training dynamics. One of these phenomena, Linear Mode
Connectivity (LMC), has gained considerable attention due to the intriguing
observation that different solutions can be connected by a linear path in the
parameter space while maintaining near-constant training and test losses. In
this work, we introduce a stronger notion of linear connectivity, Layerwise
Linear Feature Connectivity (LLFC), which says that the feature maps of every
layer in different trained networks are also linearly connected. We provide
comprehensive empirical evidence for LLFC across a wide range of settings,
demonstrating that whenever two trained networks satisfy LMC (via either
spawning or permutation methods), they also satisfy LLFC in nearly all the
layers. Furthermore, we delve deeper into the underlying factors contributing
to LLFC, which reveal new insights into the spawning and permutation
approaches. The study of LLFC transcends and advances our understanding of LMC
by adopting a feature-learning perspective.
Related papers
- On the Local Complexity of Linear Regions in Deep ReLU Networks [15.335716956682203]
We show theoretically that ReLU networks that learn low-dimensional feature representations have a lower local complexity.
In particular, we show that the local complexity serves as an upper bound on the total variation of the function over the input data distribution.
arXiv Detail & Related papers (2024-12-24T08:42:39Z) - Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning [53.685764040547625]
Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities.
This work provides a fine mathematical analysis to show how transformers leverage the multi-concept semantics of words to enable powerful ICL and excellent out-of-distribution ICL abilities.
arXiv Detail & Related papers (2024-11-04T15:54:32Z) - Landscaping Linear Mode Connectivity [76.39694196535996]
linear mode connectivity (LMC) has garnered interest from both theoretical and practical fronts.
We take a step towards understanding it by providing a model of how the loss landscape needs to behave topographically for LMC.
arXiv Detail & Related papers (2024-06-24T03:53:30Z) - Understanding Deep Representation Learning via Layerwise Feature
Compression and Discrimination [33.273226655730326]
We show that each layer of a deep linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate.
This is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
arXiv Detail & Related papers (2023-11-06T09:00:38Z) - Understanding Deep Neural Networks via Linear Separability of Hidden
Layers [68.23950220548417]
We first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets.
We demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance.
arXiv Detail & Related papers (2023-07-26T05:29:29Z) - Training invariances and the low-rank phenomenon: beyond linear networks [44.02161831977037]
We show that when one trains a deep linear network with logistic or exponential loss on linearly separable data, the weights converge to rank-$1$ matrices.
This is the first time a low-rank phenomenon is proven rigorously for nonlinear ReLU-activated feedforward networks.
Our proof relies on a specific decomposition of the network into a multilinear function and another ReLU network whose weights are constant under a certain parameter directional convergence.
arXiv Detail & Related papers (2022-01-28T07:31:19Z) - Solving Sparse Linear Inverse Problems in Communication Systems: A Deep
Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems.
The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.