Do Deep Neural Network Solutions Form a Star Domain?
- URL: http://arxiv.org/abs/2403.07968v2
- Date: Sun, 9 Jun 2024 11:51:03 GMT
- Title: Do Deep Neural Network Solutions Form a Star Domain?
- Authors: Ankit Sonthalia, Alexander Rubinstein, Ehsan Abbasnejad, Seong Joon Oh,
- Abstract summary: We propose the Starlight algorithm that finds a star model of a given learning task.
We demonstrate better uncertainty estimates on the Bayesian Model Averaging over the obtained star domain.
- Score: 68.66750305473163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It has recently been conjectured that neural network solution sets reachable via stochastic gradient descent (SGD) are convex, considering permutation invariances (Entezari et al., 2022). This means that a linear path can connect two independent solutions with low loss, given the weights of one of the models are appropriately permuted. However, current methods to test this theory often require very wide networks to succeed. In this work, we conjecture that more generally, the SGD solution set is a "star domain" that contains a "star model" that is linearly connected to all the other solutions via paths with low loss values, modulo permutations. We propose the Starlight algorithm that finds a star model of a given learning task. We validate our claim by showing that this star model is linearly connected with other independently found solutions. As an additional benefit of our study, we demonstrate better uncertainty estimates on the Bayesian Model Averaging over the obtained star domain. Further, we demonstrate star models as potential substitutes for model ensembles. Our code is available at https://github.com/aktsonthalia/starlight.
Related papers
- LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks.
We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z) - Typical and atypical solutions in non-convex neural networks with
discrete and continuous weights [2.7127628066830414]
We study the binary and continuous negative-margin perceptrons as simple non-constrained network models learning random rules and associations.
Both models exhibit subdominant minimizers which are extremely flat and wide.
For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers.
arXiv Detail & Related papers (2023-04-26T23:34:40Z) - Graph Neural Network Bandits [89.31889875864599]
We consider the bandit optimization problem with the reward function defined over graph-structured data.
Key challenges in this setting are scaling to large domains, and to graphs with many nodes.
We show that graph neural networks (GNNs) can be used to estimate the reward function.
arXiv Detail & Related papers (2022-07-13T18:12:36Z) - Diffusion models as plug-and-play priors [98.16404662526101]
We consider the problem of inferring high-dimensional data $mathbfx$ in a model that consists of a prior $p(mathbfx)$ and an auxiliary constraint $c(mathbfx,mathbfy)$.
The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise.
arXiv Detail & Related papers (2022-06-17T21:11:36Z) - Walk for Learning: A Random Walk Approach for Federated Learning from
Heterogeneous Data [17.978941229970886]
We focus on Federated Learning (FL) as a canonical application.
One of the main challenges of FL is the communication bottleneck between the nodes and the parameter server.
We present an adaptive random walk learning algorithm.
arXiv Detail & Related papers (2022-06-01T19:53:24Z) - Distributionally Robust Semi-Supervised Learning Over Graphs [68.29280230284712]
Semi-supervised learning (SSL) over graph-structured data emerges in many network science applications.
To efficiently manage learning over graphs, variants of graph neural networks (GNNs) have been developed recently.
Despite their success in practice, most of existing methods are unable to handle graphs with uncertain nodal attributes.
Challenges also arise due to distributional uncertainties associated with data acquired by noisy measurements.
A distributionally robust learning framework is developed, where the objective is to train models that exhibit quantifiable robustness against perturbations.
arXiv Detail & Related papers (2021-10-20T14:23:54Z) - Feature Cross Search via Submodular Optimization [58.15569071608769]
We study feature cross search as a fundamental primitive in feature engineering.
We show that there exists a simple greedy $(1-1/e)$-approximation algorithm for this problem.
arXiv Detail & Related papers (2021-07-05T16:58:31Z) - How to Explain Neural Networks: A perspective of data space division [2.4499092754102874]
Interpretability of algorithms represented by deep learning has been yet an open problem.
We discuss the shortcomings of the existing explainable method based on the two attributes of explanation, which are called completeness and explicitness.
Based on the perspective of the data space division, the principle of complete local interpretable model-agnostic explanations (CLIMEP) is proposed in this paper.
arXiv Detail & Related papers (2021-05-17T13:43:37Z) - Training Generative Adversarial Networks via stochastic Nash games [2.995087247817663]
Generative adversarial networks (GANs) are a class of generative models with two antagonistic neural networks: a generator and a discriminator.
We show convergence to an exact solution when an increasing number of data is available.
We also show convergence of an averaged variant of the SRFB algorithm to a neighborhood of the solution when only few samples are available.
arXiv Detail & Related papers (2020-10-17T09:07:40Z) - A game-theoretic approach for Generative Adversarial Networks [2.995087247817663]
Generative adversarial networks (GANs) are a class of generative models, known for producing accurate samples.
Main bottleneck for their implementation is that the neural networks are very hard to train.
We propose a relaxed forward-backward algorithm for GANs.
We prove that when the pseudogradient mapping of the game is monotone, we have convergence to an exact solution or in a neighbourhood of it.
arXiv Detail & Related papers (2020-03-30T17:14:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.