Related papers: On Measuring Localization of Shortcuts in Deep Networks

On Measuring Localization of Shortcuts in Deep Networks

URL: http://arxiv.org/abs/2510.26560v2
Date: Wed, 05 Nov 2025 11:27:32 GMT
Title: On Measuring Localization of Shortcuts in Deep Networks
Authors: Nikita Tsoy, Nikola Konstantinov,
Abstract summary: Shortcuts, spurious rules that perform well during training but fail to generalize, present a major challenge to the reliability of deep networks.<n>We study shortcuts on CIFAR-10, Waterbirds, and CelebA datasets across VGG, ResNet, DeiT, and ConvNeXt architectures.<n>We find that shortcut learning is not localized in specific layers but distributed throughout the network.
Score: 10.928881579403907
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Shortcuts, spurious rules that perform well during training but fail to generalize, present a major challenge to the reliability of deep networks (Geirhos et al., 2020). However, the impact of shortcuts on feature representations remains understudied, obstructing the design of principled shortcut-mitigation methods. To overcome this limitation, we investigate the layer-wise localization of shortcuts in deep models. Our novel experiment design quantifies the layer-wise contribution to accuracy degradation caused by a shortcut-inducing skew by counterfactual training on clean and skewed datasets. We employ our design to study shortcuts on CIFAR-10, Waterbirds, and CelebA datasets across VGG, ResNet, DeiT, and ConvNeXt architectures. We find that shortcut learning is not localized in specific layers but distributed throughout the network. Different network parts play different roles in this process: shallow layers predominantly encode spurious features, while deeper layers predominantly forget core features that are predictive on clean data. We also analyze the differences in localization and describe its principal axes of variation. Finally, our analysis of layer-wise shortcut-mitigation strategies suggests the hardness of designing general methods, supporting dataset- and architecture-specific approaches instead.

Related papers

Preventing Shortcut Learning in Medical Image Analysis through Intermediate Layer Knowledge Distillation from Specialist Teachers [0.0]
Deep learning models are prone to learning shortcuts to problems using spuriously correlated yet irrelevant features of their training data.<n>In high-risk applications such as medical image analysis, this phenomenon may prevent models from using clinically meaningful features when making predictions.<n>We propose a novel knowledge distillation framework that leverages a teacher network fine-tuned on a small subset of task-relevant data to mitigate shortcut learning.
arXiv Detail & Related papers (2025-11-21T17:18:35Z)
Step by Step Network [56.413861208019576]
Scaling up network depth is a fundamental pursuit in neural architecture design.<n>In this paper, we identify two key barriers that obstruct residual models from scaling deeper: shortcut degradation and limited width.<n>We propose a generalized residual architecture dubbed Step by Step Network (StepsNet) to bridge the gap between theoretical potential and practical performance.
arXiv Detail & Related papers (2025-11-18T10:35:49Z)
Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning [57.3885832382455]
We show that introducing static network sparsity alone can unlock further scaling potential beyond dense counterparts with state-of-the-art architectures.<n>Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity.
arXiv Detail & Related papers (2025-06-20T17:54:24Z)
Auto-Compressing Networks [51.221103189527014]
We introduce Auto-compression Networks (ACNs), an architectural variant where long feedforward connections from each layer replace traditional short residual connections.<n>We show that ACNs exhibit enhanced noise compared to residual networks, superior performance in low-data settings, and mitigate catastrophic forgetting.<n>These findings establish ACNs as a practical approach to developing efficient neural architectures.
arXiv Detail & Related papers (2025-06-11T13:26:09Z)
Component-based Sketching for Deep ReLU Nets [55.404661149594375]
We develop a sketching scheme based on deep net components for various tasks. We transform deep net training into a linear empirical risk minimization problem. We show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions.
arXiv Detail & Related papers (2024-09-21T15:30:43Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
SAR Despeckling Using Overcomplete Convolutional Networks [53.99620005035804]
despeckling is an important problem in remote sensing as speckle degrades SAR images. Recent studies show that convolutional neural networks(CNNs) outperform classical despeckling methods. This study employs an overcomplete CNN architecture to focus on learning low-level features by restricting the receptive field. We show that the proposed network improves despeckling performance compared to recent despeckling methods on synthetic and real SAR images.
arXiv Detail & Related papers (2022-05-31T15:55:37Z)
SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task. The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders. To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z)
PDFNet: Pointwise Dense Flow Network for Urban-Scene Segmentation [0.0]
We propose a novel lightweight architecture named point-wise dense flow network (PDFNet) In PDFNet, we employ dense, residual, and multiple shortcut connections to allow a smooth gradient flow to all parts of the network. Our method significantly outperforms baselines in capturing small classes and in few-data regimes.
arXiv Detail & Related papers (2021-09-21T10:39:46Z)
Impact of Aliasing on Generalization in Deep Convolutional Networks [29.41652467340308]
We investigate the impact of aliasing on generalization in Deep Convolutional Networks. We show how to mitigate aliasing by inserting non-trainable low-pass filters at key locations.
arXiv Detail & Related papers (2021-08-07T17:12:03Z)
A Too-Good-to-be-True Prior to Reduce Shortcut Reliance [0.19573380763700707]
Deep convolutional neural networks (DCNNs) often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for this shortcoming is that modern architectures tend to rely on "shortcuts" We implement this inductive bias in a two-stage approach that uses predictions from a low-capacity network to inform the training of a high-capacity network.
arXiv Detail & Related papers (2021-02-12T09:17:24Z)
Why Layer-Wise Learning is Hard to Scale-up and a Possible Solution via Accelerated Downsampling [19.025707054206457]
Layer-wise learning can achieve state-of-the-art performance in image classification on various datasets. Previous studies of layer-wise learning are limited to networks with simple hierarchical structures. This paper reveals the fundamental reason that impedes the scale-up of layer-wise learning is due to the relatively poor separability of the feature space in shallow layers.
arXiv Detail & Related papers (2020-10-15T21:51:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.