Naturally Computed Scale Invariance in the Residual Stream of ResNet18
- URL: http://arxiv.org/abs/2504.16290v2
- Date: Tue, 29 Apr 2025 18:34:02 GMT
- Title: Naturally Computed Scale Invariance in the Residual Stream of ResNet18
- Authors: André Longon,
- Abstract summary: This work investigates ResNet18 with a particular focus on its residual stream, an architectural component which InceptionV1 lacks.<n>We observe that many convolutional channels in intermediate blocks exhibit scale invariant properties, computed by the element-wise residual summation of scale equivariant representations.<n>Through subsequent ablation experiments, we attempt to causally link these neural properties with scale-robust object recognition behavior.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An important capacity in visual object recognition is invariance to image-altering variables which leave the identity of objects unchanged, such as lighting, rotation, and scale. How do neural networks achieve this? Prior mechanistic interpretability research has illuminated some invariance-building circuitry in InceptionV1, but the results are limited and networks with different architectures have remained largely unexplored. This work investigates ResNet18 with a particular focus on its residual stream, an architectural component which InceptionV1 lacks. We observe that many convolutional channels in intermediate blocks exhibit scale invariant properties, computed by the element-wise residual summation of scale equivariant representations: the block input's smaller-scale copy with the block pre-sum output's larger-scale copy. Through subsequent ablation experiments, we attempt to causally link these neural properties with scale-robust object recognition behavior. Our tentative findings suggest how the residual stream computes scale invariance and its possible role in behavior. Code is available at: https://github.com/cest-andre/residual-stream-interp
Related papers
- WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration [68.25711405944239]
Deep image registration has demonstrated exceptional accuracy and fast inference.
Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner.
We introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales.
arXiv Detail & Related papers (2024-07-18T11:51:01Z) - Interpreting the Residual Stream of ResNet18 [0.0]
This work investigates ResNet18 with a particular focus on its residual stream, an architectural mechanism which InceptionV1 lacks.
We show that many residual stream channels compute scale invariant representations through a mixture of the input's smaller-scale feature with the block's larger-scale feature.
arXiv Detail & Related papers (2024-07-07T12:13:03Z) - Truly Scale-Equivariant Deep Nets with Fourier Layers [14.072558848402362]
In computer vision, models must be able to adapt to changes in image resolution to effectively carry out tasks such as image segmentation.
Recent works have made progress in developing scale-equivariant convolutional neural networks, through weight-sharing and kernel resizing.
We propose a novel architecture based on Fourier layers to achieve truly scale-equivariant deep nets.
arXiv Detail & Related papers (2023-11-06T07:32:27Z) - Riesz networks: scale invariant neural networks in a single forward pass [0.7673339435080445]
We introduce the Riesz network, a novel scale invariant neural network.
As an application example, we consider detecting and segmenting cracks in tomographic images of concrete.
We then validate its performance in segmenting simulated and real tomographic images featuring a wide range of crack widths.
arXiv Detail & Related papers (2023-05-08T12:39:49Z) - Finding Differences Between Transformers and ConvNets Using
Counterfactual Simulation Testing [82.67716657524251]
We present a counterfactual framework that allows us to study the robustness of neural networks with respect to naturalistic variations.
Our method allows for a fair comparison of the robustness of recently released, state-of-the-art Convolutional Neural Networks and Vision Transformers.
arXiv Detail & Related papers (2022-11-29T18:59:23Z) - Rethinking Spatial Invariance of Convolutional Networks for Object
Counting [119.83017534355842]
We try to use locally connected Gaussian kernels to replace the original convolution filter to estimate the spatial position in the density map.
Inspired by previous work, we propose a low-rank approximation accompanied with translation invariance to favorably implement the approximation of massive Gaussian convolution.
Our methods significantly outperform other state-of-the-art methods and achieve promising learning of the spatial position of objects.
arXiv Detail & Related papers (2022-06-10T17:51:25Z) - Object-aware Monocular Depth Prediction with Instance Convolutions [72.98771405534937]
We propose a novel convolutional operator which is explicitly tailored to avoid feature aggregation.
Our method is based on estimating per-part depth values by means of superpixels.
Our evaluation with respect to the NYUv2 as well as the iBims dataset clearly demonstrates the superiority of Instance Convolutions.
arXiv Detail & Related papers (2021-12-02T18:59:48Z) - Quantised Transforming Auto-Encoders: Achieving Equivariance to
Arbitrary Transformations in Deep Networks [23.673155102696338]
Convolutional Neural Networks (CNNs) are equivariant to image translation.
We propose an auto-encoder architecture whose embedding obeys an arbitrary set of equivariance relations simultaneously.
We demonstrate results of successful re-rendering of transformed versions of input images on several datasets.
arXiv Detail & Related papers (2021-11-25T02:26:38Z) - What Does CNN Shift Invariance Look Like? A Visualization Study [87.79405274610681]
Feature extraction with convolutional neural networks (CNNs) is a popular method to represent images for machine learning tasks.
We focus on measuring and visualizing the shift invariance of extracted features from popular off-the-shelf CNN models.
We conclude that features extracted from popular networks are not globally invariant, and that biases and artifacts exist within this variance.
arXiv Detail & Related papers (2020-11-09T01:16:30Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.