Related papers: Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity

Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity

URL: http://arxiv.org/abs/2404.06391v1
Date: Tue, 9 Apr 2024 15:35:02 GMT
Title: Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity
Authors: Zhanran Lin, Puheng Li, Lei Wu,
Abstract summary: We show that for two typical global minima, there exists a path connecting them without barrier. For a finite number of typical minima, there exists a center on minima manifold that connects all of them simultaneously. Results are provably valid for linear networks and two-layer ReLU networks under a teacher-student setup.
Score: 4.516746821973374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the most intriguing findings in the structure of neural network landscape is the phenomenon of mode connectivity: For two typical global minima, there exists a path connecting them without barrier. This concept of mode connectivity has played a crucial role in understanding important phenomena in deep learning. In this paper, we conduct a fine-grained analysis of this connectivity phenomenon. First, we demonstrate that in the overparameterized case, the connecting path can be as simple as a two-piece linear path, and the path length can be nearly equal to the Euclidean distance. This finding suggests that the landscape should be nearly convex in a certain sense. Second, we uncover a surprising star-shaped connectivity: For a finite number of typical minima, there exists a center on minima manifold that connects all of them simultaneously via linear paths. These results are provably valid for linear networks and two-layer ReLU networks under a teacher-student setup, and are empirically supported by models trained on MNIST and CIFAR-10.

Related papers

Input Space Mode Connectivity in Deep Neural Networks [5.8470747480006695]
We extend the concept of loss landscape mode connectivity to the input space of deep neural networks. We present theoretical and empirical evidence of its presence in the input space of deep networks. We exploit mode connectivity to obtain new insights about adversarial examples and demonstrate its potential for adversarial detection.
arXiv Detail & Related papers (2024-09-09T17:03:43Z)
Landscaping Linear Mode Connectivity [76.39694196535996]
linear mode connectivity (LMC) has garnered interest from both theoretical and practical fronts. We take a step towards understanding it by providing a model of how the loss landscape needs to behave topographically for LMC.
arXiv Detail & Related papers (2024-06-24T03:53:30Z)
Half-Space Feature Learning in Neural Networks [2.3249139042158853]
There currently exist two extreme viewpoints for neural network feature learning. We argue neither interpretation is likely to be correct based on a novel viewpoint. We use this alternate interpretation to motivate a model, called the Deep Linearly Gated Network (DLGN)
arXiv Detail & Related papers (2024-04-05T12:03:19Z)
Proving Linear Mode Connectivity of Neural Networks via Optimal Transport [27.794244660649085]
We provide a framework theoretically explaining this empirical observation. We show how the support weight distribution neurons, which dictates Wasserstein convergence rates is correlated with mode connectivity.
arXiv Detail & Related papers (2023-10-29T18:35:05Z)
Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity [62.11981948274508]
The study of LLFC transcends and advances our understanding of LMC by adopting a feature-learning perspective. We provide comprehensive empirical evidence for LLFC across a wide range of settings, demonstrating that whenever two trained networks satisfy LMC, they also satisfy LLFC in nearly all the layers.
arXiv Detail & Related papers (2023-07-17T07:16:28Z)
Feature-Learning Networks Are Consistent Across Widths At Realistic Scales [72.27228085606147]
We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training. We observe, however, that ensembles of narrower networks perform worse than a single wide network.
arXiv Detail & Related papers (2023-05-28T17:09:32Z)
From Compass and Ruler to Convolution and Nonlinearity: On the Surprising Difficulty of Understanding a Simple CNN Solving a Simple Geometric Estimation Task [6.230751621285322]
We propose to address a simple well-posed learning problem using a simple convolutional neural network. Surprisingly, understanding what trained networks have learned is difficult and, to some extent, counter-intuitive.
arXiv Detail & Related papers (2023-03-12T11:30:49Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.