Related papers: A separability-based approach to quantifying generalization: which layer is best?

A separability-based approach to quantifying generalization: which layer is best?

URL: http://arxiv.org/abs/2405.01524v2
Date: Fri, 3 May 2024 16:03:57 GMT
Title: A separability-based approach to quantifying generalization: which layer is best?
Authors: Luciano Dyballa, Evan Gerritz, Steven W. Zucker,
Abstract summary: Generalization to unseen data remains poorly understood for deep learning classification and foundation models. We provide a new method for evaluating the capacity of networks to represent a sampled domain. We find that (i) high classification accuracy does not imply high generalizability; and (ii) deeper layers in a model do not always generalize the best.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generalization to unseen data remains poorly understood for deep learning classification and foundation models. How can one assess the ability of networks to adapt to new or extended versions of their input space in the spirit of few-shot learning, out-of-distribution generalization, and domain adaptation? Which layers of a network are likely to generalize best? We provide a new method for evaluating the capacity of networks to represent a sampled domain, regardless of whether the network has been trained on all classes in the domain. Our approach is the following: after fine-tuning state-of-the-art pre-trained models for visual classification on a particular domain, we assess their performance on data from related but distinct variations in that domain. Generalization power is quantified as a function of the latent embeddings of unseen data from intermediate layers for both unsupervised and supervised settings. Working throughout all stages of the network, we find that (i) high classification accuracy does not imply high generalizability; and (ii) deeper layers in a model do not always generalize the best, which has implications for pruning. Since the trends observed across datasets are largely consistent, we conclude that our approach reveals (a function of) the intrinsic capacity of the different layers of a model to generalize.

Related papers

Generalization performance of narrow one-hidden layer networks in the teacher-student setting [40.69556943879117]
We develop a general theory for narrow networks, i.e. networks with a large number of hidden units, yet much smaller than the input dimension.<n>Our theory accurately predicts the generalization error of neural networks trained on regression or classification tasks.
arXiv Detail & Related papers (2025-07-01T10:18:20Z)
Intermediate Layer Classifiers for OOD generalization [17.13749013546228]
In this work, we question the use of last-layer representations for out-of-distribution (OOD) generalisation. We discover that intermediate layer representations frequently offer substantially better generalisation than those from the penultimate layer. Our analysis suggests that intermediate layers are less sensitive to distribution shifts compared to the penultimate layer.
arXiv Detail & Related papers (2025-04-07T19:50:50Z)
Self-Supervised Learning for Covariance Estimation [3.04585143845864]
We propose to globally learn a neural network that will then be applied locally at inference time. The architecture is based on the popular attention mechanism. It can be pre-trained as a foundation model and then be repurposed for various downstream tasks, e.g., adaptive target detection in radar or hyperspectral imagery.
arXiv Detail & Related papers (2024-03-13T16:16:20Z)
Zero-shot generalization across architectures for visual classification [0.0]
Generalization to unseen data is a key desideratum for deep networks, but its relation to classification accuracy is unclear. We show that popular networks, from deep convolutional networks (CNNs) to transformers, vary in their power to extrapolate to unseen classes both across layers and across architectures.
arXiv Detail & Related papers (2024-02-21T19:45:05Z)
Activate and Reject: Towards Safe Domain Generalization under Category Shift [71.95548187205736]
We study a practical problem of Domain Generalization under Category Shift (DGCS) It aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains. Compared to prior DG works, we face two new challenges: 1) how to learn the concept of unknown'' during training with only source known-class samples, and 2) how to adapt the source-trained model to unseen environments.
arXiv Detail & Related papers (2023-10-07T07:53:12Z)
Do We Really Need a Learnable Classifier at the End of Deep Neural Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training. Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z)
Towards Disentangling Information Paths with Coded ResNeXt [11.884259630414515]
We take a novel approach to enhance the transparency of the function of the whole network. We propose a neural network architecture for classification, in which the information that is relevant to each class flows through specific paths.
arXiv Detail & Related papers (2022-02-10T21:45:49Z)
Hierarchical Variational Memory for Few-shot Learning Across Domains [120.87679627651153]
We introduce a hierarchical prototype model, where each level of the prototype fetches corresponding information from the hierarchical memory. The model is endowed with the ability to flexibly rely on features at different semantic levels if the domain shift circumstances so demand. We conduct thorough ablation studies to demonstrate the effectiveness of each component in our model.
arXiv Detail & Related papers (2021-12-15T15:01:29Z)
Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution. First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers. Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z)
More Is More -- Narrowing the Generalization Gap by Adding Classification Heads [8.883733362171032]
We introduce an architecture enhancement for existing neural network models based on input transformations, termed 'TransNet' Our model can be employed during training time only and then pruned for prediction, resulting in an equivalent architecture to the base model.
arXiv Detail & Related papers (2021-02-09T16:30:33Z)
Fine-Grained Visual Classification with Efficient End-to-end Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup. We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z)
Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective [98.70226503904402]
Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions. We propose to augment the classic class-balanced learning by explicitly estimating the differences between the class-conditioned distributions with a meta-learning approach.
arXiv Detail & Related papers (2020-03-24T11:28:42Z)
AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks [12.14537824884951]
We propose a novel regularization method that progressively penalizes the magnitude of activations during training. Our method's effect on generalization is analyzed with label randomization tests and cumulative ablations.
arXiv Detail & Related papers (2020-03-07T18:38:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.