Related papers: Intermediate Layer Classifiers for OOD generalization

Intermediate Layer Classifiers for OOD generalization

URL: http://arxiv.org/abs/2504.05461v1
Date: Mon, 07 Apr 2025 19:50:50 GMT
Title: Intermediate Layer Classifiers for OOD generalization
Authors: Arnas Uselis, Seong Joon Oh,
Abstract summary: In this work, we question the use of last-layer representations for out-of-distribution (OOD) generalisation.<n>We discover that intermediate layer representations frequently offer substantially better generalisation than those from the penultimate layer.<n>Our analysis suggests that intermediate layers are less sensitive to distribution shifts compared to the penultimate layer.
Score: 17.13749013546228
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep classifiers are known to be sensitive to data distribution shifts, primarily due to their reliance on spurious correlations in training data. It has been suggested that these classifiers can still find useful features in the network's last layer that hold up under such shifts. In this work, we question the use of last-layer representations for out-of-distribution (OOD) generalisation and explore the utility of intermediate layers. To this end, we introduce \textit{Intermediate Layer Classifiers} (ILCs). We discover that intermediate layer representations frequently offer substantially better generalisation than those from the penultimate layer. In many cases, zero-shot OOD generalisation using earlier-layer representations approaches the few-shot performance of retraining on penultimate layer representations. This is confirmed across multiple datasets, architectures, and types of distribution shifts. Our analysis suggests that intermediate layers are less sensitive to distribution shifts compared to the penultimate layer. These findings highlight the importance of understanding how information is distributed across network layers and its role in OOD generalisation, while also pointing to the limits of penultimate layer representation utility. Code is available at https://github.com/oshapio/intermediate-layer-generalization

Related papers

The Generalization Ridge: Information Flow in Natural Language Generation [7.756342860929851]
We show how predictive information peaks in upper-middle layers-forming a generalization ridge-before declining in final layers.<n>These findings offer new insights into the internal mechanisms of transformers and underscore the critical role of intermediate layers in supporting generalization.
arXiv Detail & Related papers (2025-07-07T18:18:51Z)
Leveraging Intermediate Representations for Better Out-of-Distribution Detection [3.903824667492754]
In real-world applications, machine learning models must reliably detect Out-of-Distribution (OoD) samples to prevent unsafe decisions.<n>We analyze the discriminative power of intermediate layers and show that they can positively be used for OoD detection.<n>We demonstrate that intermediate layer activations improves OoD detection performance by running a comprehensive evaluation across multiple datasets.
arXiv Detail & Related papers (2025-02-18T13:38:19Z)
Layer by Layer: Uncovering Hidden Representations in Language Models [28.304269706993942]
We show that intermediate layers can encode even richer representations, often improving performance on a wide range of downstream tasks.<n>Our framework highlights how each model layer balances information compression and signal preservation.<n>These findings challenge the standard focus on final-layer embeddings and open new directions for model analysis and optimization.
arXiv Detail & Related papers (2025-02-04T05:03:42Z)
A separability-based approach to quantifying generalization: which layer is best? [0.0]
Generalization to unseen data remains poorly understood for deep learning classification and foundation models. We provide a new method for evaluating the capacity of networks to represent a sampled domain. We find that (i) high classification accuracy does not imply high generalizability; and (ii) deeper layers in a model do not always generalize the best.
arXiv Detail & Related papers (2024-05-02T17:54:35Z)
Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models [55.45444773200529]
Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination. Recent work has focused on decoding techniques to improve factuality during inference.
arXiv Detail & Related papers (2024-04-14T19:45:35Z)
Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z)
Hidden Classification Layers: Enhancing linear separability between classes in neural networks layers [0.0]
We investigate the impact on deep network performances of a training approach. We propose a neural network architecture which induces an error function involving the outputs of all the network layers.
arXiv Detail & Related papers (2023-06-09T10:52:49Z)
WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization. We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer. We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z)
MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos. We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z)
An evidential classifier based on Dempster-Shafer theory and deep learning [6.230751621285322]
We propose a new classification system based on Dempster-Shafer (DS) theory and a convolutional neural network (CNN) architecture for set-valued classification. Experiments on image recognition, signal processing, and semantic-relationship classification tasks demonstrate that the proposed combination of deep CNN, DS layer, and expected utility layer makes it possible to improve classification accuracy.
arXiv Detail & Related papers (2021-03-25T01:29:05Z)
Cross-layer Feature Pyramid Network for Salient Object Detection [102.20031050972429]
We propose a novel Cross-layer Feature Pyramid Network to improve the progressive fusion in salient object detection. The distributed features per layer own both semantics and salient details from all other layers simultaneously, and suffer reduced loss of important information.
arXiv Detail & Related papers (2020-02-25T14:06:27Z)
Convolutional Networks with Dense Connectivity [59.30634544498946]
We introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks.
arXiv Detail & Related papers (2020-01-08T06:54:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.