Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers
- URL: http://arxiv.org/abs/2511.04514v1
- Date: Thu, 06 Nov 2025 16:30:56 GMT
- Title: Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers
- Authors: C. Hepburn, T. Zielke, A. P. Raulf,
- Abstract summary: linear mode connectivity (LMC) links several aspects of deep learning.<n>We experimentally study LMC under data shifts and identify conditions that mitigate their impact.<n>Although models sampled via LMC tend to make similar errors more frequently than those converging to different basins, the benefit of LMC lies in balancing training efficiency against the gains achieved from larger, more diverse ensembles.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The phenomenon of linear mode connectivity (LMC) links several aspects of deep learning, including training stability under noisy stochastic gradients, the smoothness and generalization of local minima (basins), the similarity and functional diversity of sampled models, and architectural effects on data processing. In this work, we experimentally study LMC under data shifts and identify conditions that mitigate their impact. We interpret data shifts as an additional source of stochastic gradient noise, which can be reduced through small learning rates and large batch sizes. These parameters influence whether models converge to the same local minimum or to regions of the loss landscape with varying smoothness and generalization. Although models sampled via LMC tend to make similar errors more frequently than those converging to different basins, the benefit of LMC lies in balancing training efficiency against the gains achieved from larger, more diverse ensembles. Code and supplementary materials will be made publicly available at https://github.com/DLR-KI/LMC in due course.
Related papers
- Inverse Depth Scaling From Most Layers Being Similar [20.276718813247786]
We quantify how depth affects loss via analysis of large language models (LLMs)<n>We find loss scales inversely proportional to depth in LLMs, probably due to functionally similar layers reducing error through ensemble averaging.
arXiv Detail & Related papers (2026-02-05T18:22:41Z) - Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z) - Align Your Tangent: Training Better Consistency Models via Manifold-Aligned Tangents [55.43139356528315]
Consistency Models (CMs) are trained to be consistent on flow ordinary differential equation trajectories.<n>CMs typically require prolonged training with large batch sizes to obtain competitive sample quality.<n>We propose a new loss function, called the manifold feature distance (MFD), which provides manifold-aligned tangents that point toward the data manifold.
arXiv Detail & Related papers (2025-10-01T08:35:18Z) - Integrating Random Effects in Variational Autoencoders for Dimensionality Reduction of Correlated Data [9.990687944474738]
LMMVAE is a novel model which separates the classic VAE latent model into fixed and random parts.<n>It is shown to improve squared reconstruction error and negative likelihood loss significantly on unseen data.
arXiv Detail & Related papers (2024-12-22T07:20:17Z) - An Enhanced Classification Method Based on Adaptive Multi-Scale Fusion for Long-tailed Multispectral Point Clouds [67.96583737413296]
We propose an enhanced classification method based on adaptive multi-scale fusion for MPCs with long-tailed distributions.<n>In the training set generation stage, a grid-balanced sampling strategy is designed to reliably generate training samples from sparse labeled datasets.<n>In the feature learning stage, a multi-scale feature fusion module is proposed to fuse shallow features of land-covers at different scales.
arXiv Detail & Related papers (2024-12-16T03:21:20Z) - Dissecting Representation Misalignment in Contrastive Learning via Influence Function [15.28417468377201]
We introduce the Extended Influence Function for Contrastive Loss (ECIF), an influence function crafted for contrastive loss.<n>ECIF considers both positive and negative samples and provides a closed-form approximation of contrastive learning models.<n>Building upon ECIF, we develop a series of algorithms for data evaluation, misalignment detection, and misprediction trace-back tasks.
arXiv Detail & Related papers (2024-11-18T15:45:41Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [56.00251589760559]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.<n>We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.<n>Experiments indicate that LM-GC surpasses existing state-of-the-art lossless compression methods.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Entropy Law: The Story Behind Data Compression and LLM Performance [115.70395740286422]
We find that model performance is negatively correlated to the compression ratio of training data, which usually yields a lower training loss.
Based on the findings of the entropy law, we propose a quite efficient and universal data selection method.
We also present an interesting application of entropy law that can detect potential performance risks at the beginning of model training.
arXiv Detail & Related papers (2024-07-09T08:14:29Z) - L-DAWA: Layer-wise Divergence Aware Weight Aggregation in Federated
Self-Supervised Visual Representation Learning [14.888569402903562]
Integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees.
We propose a new aggregation strategy termed Layer-wise Divergence Aware Weight Aggregation (L-DAWA) to mitigate the influence of client bias and divergence during FL aggregation.
arXiv Detail & Related papers (2023-07-14T15:07:30Z) - An Adaptive Plug-and-Play Network for Few-Shot Learning [12.023266104119289]
Few-shot learning requires a model to classify new samples after learning from only a few samples.
Deep networks and complex metrics tend to induce overfitting, making it difficult to further improve the performance.
We propose plug-and-play model-adaptive resizer (MAR) and adaptive similarity metric (ASM) without any other losses.
arXiv Detail & Related papers (2023-02-18T13:25:04Z) - Towards Scale Balanced 6-DoF Grasp Detection in Cluttered Scenes [19.25678039613183]
We propose a novel approach to especially address the difficulty in dealing with small-scale samples.
A Multi-scale Cylinder Grouping (MsCG) module is presented to enhance local geometry representation.
Noisy-clean Mix (NcM) data augmentation is introduced to facilitate training.
arXiv Detail & Related papers (2022-12-10T11:31:12Z) - Adaptive Hierarchical Similarity Metric Learning with Noisy Labels [138.41576366096137]
We propose an Adaptive Hierarchical Similarity Metric Learning method.
It considers two noise-insensitive information, textiti.e., class-wise divergence and sample-wise consistency.
Our method achieves state-of-the-art performance compared with current deep metric learning approaches.
arXiv Detail & Related papers (2021-10-29T02:12:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.