Linear Connectivity Reveals Generalization Strategies
- URL: http://arxiv.org/abs/2205.12411v1
- Date: Tue, 24 May 2022 23:43:02 GMT
- Title: Linear Connectivity Reveals Generalization Strategies
- Authors: Jeevesh Juneja and Rachit Bansal and Kyunghyun Cho and Jo\~ao Sedoc
and Naomi Saphra
- Abstract summary: Some pairs of finetuned models have large barriers of increasing loss on the linear paths between them.
We find distinct clusters of models which are linearly connected on the test loss surface, but are disconnected from models outside the cluster.
Our work demonstrates how the geometry of the loss surface can guide models towards different functions.
- Score: 54.947772002394736
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: It is widely accepted in the mode connectivity literature that when two
neural networks are trained similarly on the same data, they are connected by a
path through parameter space over which test set accuracy is maintained. Under
some circumstances, including transfer learning from pretrained models, these
paths are presumed to be linear. In contrast to existing results, we find that
among text classifiers (trained on MNLI, QQP, and CoLA), some pairs of
finetuned models have large barriers of increasing loss on the linear paths
between them. On each task, we find distinct clusters of models which are
linearly connected on the test loss surface, but are disconnected from models
outside the cluster -- models that occupy separate basins on the surface. By
measuring performance on specially-crafted diagnostic datasets, we find that
these clusters correspond to different generalization strategies: one cluster
behaves like a bag of words model under domain shift, while another cluster
uses syntactic heuristics. Our work demonstrates how the geometry of the loss
surface can guide models towards different heuristic functions.
Related papers
- Bilinear Convolution Decomposition for Causal RL Interpretability [0.0]
Efforts to interpret reinforcement learning (RL) models often rely on high-level techniques such as attribution or probing.
This work proposes replacing nonlinearities in convolutional neural networks (ConvNets) with bilinear variants, to produce a class of models for which these limitations can be addressed.
We show bilinear model variants perform comparably in model-free reinforcement learning settings, and give a side by side comparison on ProcGen environments.
arXiv Detail & Related papers (2024-12-01T19:32:04Z) - Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance.
We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks.
We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z) - Layer-wise Linear Mode Connectivity [52.6945036534469]
Averaging neural network parameters is an intuitive method for the knowledge of two independent models.
It is most prominently used in federated learning.
We analyse the performance of the models that result from averaging single, or groups.
arXiv Detail & Related papers (2023-07-13T09:39:10Z) - Phantom Embeddings: Using Embedding Space for Model Regularization in
Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data.
The complex models tend to memorize the training data, which results in poor regularization performance on test data.
We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z) - Git Re-Basin: Merging Models modulo Permutation Symmetries [3.5450828190071655]
We show how simple algorithms can be used to fit large networks in practice.
We demonstrate the first (to our knowledge) demonstration of zero mode connectivity between independently trained models.
We also discuss shortcomings in the linear mode connectivity hypothesis.
arXiv Detail & Related papers (2022-09-11T10:44:27Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - T-LoHo: A Bayesian Regularization Model for Structured Sparsity and
Smoothness on Graphs [0.0]
In graph-structured data, structured sparsity and smoothness tend to cluster together.
We propose a new prior for high dimensional parameters with graphical relations.
We use it to detect structured sparsity and smoothness simultaneously.
arXiv Detail & Related papers (2021-07-06T10:10:03Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - A Bootstrap-based Method for Testing Network Similarity [0.0]
This paper studies the matched network inference problem.
The goal is to determine if two networks, defined on a common set of nodes, exhibit a specific form of similarity.
Two notions of similarity are considered: (i) equality, i.e., testing whether the networks arise from the same random graph model, and (ii) scaling, i.e., testing whether their probability are proportional for some unknown scaling constant.
arXiv Detail & Related papers (2019-11-15T20:50:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.