Mode Combinability: Exploring Convex Combinations of Permutation Aligned
Models
- URL: http://arxiv.org/abs/2308.11511v1
- Date: Tue, 22 Aug 2023 15:39:29 GMT
- Title: Mode Combinability: Exploring Convex Combinations of Permutation Aligned
Models
- Authors: Adri\'an Csisz\'arik, Melinda F. Kiss, P\'eter K\H{o}r\"osi-Szab\'o,
M\'arton Muntag, Gergely Papp, D\'aniel Varga
- Abstract summary: We investigate convex combinations of two permutation-aligned neural network parameter vectors $Theta_A$ and $Theta_B$ of size $d$.
We show that broad regions of the hypercube form surfaces of low loss values, indicating that the notion of linear mode connectivity extends to a more general phenomenon.
- Score: 0.559239450391449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore element-wise convex combinations of two permutation-aligned neural
network parameter vectors $\Theta_A$ and $\Theta_B$ of size $d$. We conduct
extensive experiments by examining various distributions of such model
combinations parametrized by elements of the hypercube $[0,1]^{d}$ and its
vicinity. Our findings reveal that broad regions of the hypercube form surfaces
of low loss values, indicating that the notion of linear mode connectivity
extends to a more general phenomenon which we call mode combinability. We also
make several novel observations regarding linear mode connectivity and model
re-basin. We demonstrate a transitivity property: two models re-based to a
common third model are also linear mode connected, and a robustness property:
even with significant perturbations of the neuron matchings the resulting
combinations continue to form a working model. Moreover, we analyze the
functional and weight similarity of model combinations and show that such
combinations are non-vacuous in the sense that there are significant functional
differences between the resulting models.
Related papers
- ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models [65.82630283336051]
We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models.
We present a simple fix to this problem by constructing processes that fully exploit the structures, hence the name ComboStoc.
arXiv Detail & Related papers (2024-05-22T15:23:10Z) - Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations [56.78271181959529]
Generalized Additive Models (GAMs) can capture non-linear relationships between variables and targets, but they cannot capture intricate feature interactions.
We propose Shape Expressions Arithmetic ( SHAREs) that fuses GAM's flexible shape functions with the complex feature interactions found in mathematical expressions.
We also design a set of rules for constructing SHAREs that guarantee transparency of the found expressions beyond the standard constraints.
arXiv Detail & Related papers (2024-04-15T13:44:01Z) - Git Re-Basin: Merging Models modulo Permutation Symmetries [3.5450828190071655]
We show how simple algorithms can be used to fit large networks in practice.
We demonstrate the first (to our knowledge) demonstration of zero mode connectivity between independently trained models.
We also discuss shortcomings in the linear mode connectivity hypothesis.
arXiv Detail & Related papers (2022-09-11T10:44:27Z) - Linear Connectivity Reveals Generalization Strategies [54.947772002394736]
Some pairs of finetuned models have large barriers of increasing loss on the linear paths between them.
We find distinct clusters of models which are linearly connected on the test loss surface, but are disconnected from models outside the cluster.
Our work demonstrates how the geometry of the loss surface can guide models towards different functions.
arXiv Detail & Related papers (2022-05-24T23:43:02Z) - Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High
Dimensions [25.635225717360466]
We show that for a large class of kernels, including the neural kernel of fully connected networks, kernel methods can only perform as well as linear models in a certain high-dimensional regime.
More complex models for the data other than independent features are needed for high-dimensional analysis.
arXiv Detail & Related papers (2022-01-20T09:35:46Z) - Bilinear Classes: A Structural Framework for Provable Generalization in
RL [119.42509700822484]
Bilinear Classes is a new structural framework which permits generalization in reinforcement learning.
The framework incorporates nearly all existing models in which a sample complexity is achievable.
Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
arXiv Detail & Related papers (2021-03-19T16:34:20Z) - Reconstruction of Pairwise Interactions using Energy-Based Models [3.553493344868414]
We show that hybrid models, which combine a pairwise model and a neural network, can lead to significant improvements in the reconstruction of pairwise interactions.
This is in line with the general idea that simple interpretable models and complex black-box models are not necessarily a dichotomy.
arXiv Detail & Related papers (2020-12-11T20:15:10Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z) - Flexible Bayesian Nonlinear Model Configuration [10.865434331546126]
Linear, or simple parametric, models are often not sufficient to describe complex relationships between input variables and a response.
We introduce a flexible approach for the construction and selection of highly flexible nonlinear parametric regression models.
A genetically modified mode jumping chain Monte Carlo algorithm is adopted to perform Bayesian inference.
arXiv Detail & Related papers (2020-03-05T21:20:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.