Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature
Connectivity
- URL: http://arxiv.org/abs/2307.08286v2
- Date: Mon, 13 Nov 2023 08:25:48 GMT
- Title: Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature
Connectivity
- Authors: Zhanpeng Zhou, Yongyi Yang, Xiaojiang Yang, Junchi Yan, Wei Hu
- Abstract summary: The study of LLFC transcends and advances our understanding of LMC by adopting a feature-learning perspective.
We provide comprehensive empirical evidence for LLFC across a wide range of settings, demonstrating that whenever two trained networks satisfy LMC, they also satisfy LLFC in nearly all the layers.
- Score: 62.11981948274508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has revealed many intriguing empirical phenomena in neural
network training, despite the poorly understood and highly complex loss
landscapes and training dynamics. One of these phenomena, Linear Mode
Connectivity (LMC), has gained considerable attention due to the intriguing
observation that different solutions can be connected by a linear path in the
parameter space while maintaining near-constant training and test losses. In
this work, we introduce a stronger notion of linear connectivity, Layerwise
Linear Feature Connectivity (LLFC), which says that the feature maps of every
layer in different trained networks are also linearly connected. We provide
comprehensive empirical evidence for LLFC across a wide range of settings,
demonstrating that whenever two trained networks satisfy LMC (via either
spawning or permutation methods), they also satisfy LLFC in nearly all the
layers. Furthermore, we delve deeper into the underlying factors contributing
to LLFC, which reveal new insights into the spawning and permutation
approaches. The study of LLFC transcends and advances our understanding of LMC
by adopting a feature-learning perspective.
Related papers
- Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning [53.685764040547625]
Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities.
This work provides a fine mathematical analysis to show how transformers leverage the multi-concept semantics of words to enable powerful ICL and excellent out-of-distribution ICL abilities.
arXiv Detail & Related papers (2024-11-04T15:54:32Z) - Landscaping Linear Mode Connectivity [76.39694196535996]
linear mode connectivity (LMC) has garnered interest from both theoretical and practical fronts.
We take a step towards understanding it by providing a model of how the loss landscape needs to behave topographically for LMC.
arXiv Detail & Related papers (2024-06-24T03:53:30Z) - Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity [4.516746821973374]
We show that for two typical global minima, there exists a path connecting them without barrier.
For a finite number of typical minima, there exists a center on minima manifold that connects all of them simultaneously.
Results are provably valid for linear networks and two-layer ReLU networks under a teacher-student setup.
arXiv Detail & Related papers (2024-04-09T15:35:02Z) - Understanding Deep Representation Learning via Layerwise Feature
Compression and Discrimination [33.273226655730326]
We show that each layer of a deep linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate.
This is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
arXiv Detail & Related papers (2023-11-06T09:00:38Z) - Layer-wise Feedback Propagation [53.00944147633484]
We present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors.
LFP assigns rewards to individual connections based on their respective contributions to solving a given task.
We demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - Understanding Deep Neural Networks via Linear Separability of Hidden
Layers [68.23950220548417]
We first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets.
We demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance.
arXiv Detail & Related papers (2023-07-26T05:29:29Z) - Training invariances and the low-rank phenomenon: beyond linear networks [44.02161831977037]
We show that when one trains a deep linear network with logistic or exponential loss on linearly separable data, the weights converge to rank-$1$ matrices.
This is the first time a low-rank phenomenon is proven rigorously for nonlinear ReLU-activated feedforward networks.
Our proof relies on a specific decomposition of the network into a multilinear function and another ReLU network whose weights are constant under a certain parameter directional convergence.
arXiv Detail & Related papers (2022-01-28T07:31:19Z) - Solving Sparse Linear Inverse Problems in Communication Systems: A Deep
Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems.
The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.