Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance
- URL: http://arxiv.org/abs/2506.17040v1
- Date: Fri, 20 Jun 2025 14:49:35 GMT
- Title: Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance
- Authors: Lorenzo Tausani, Paolo Muratore, Morgan B. Talbot, Giacomo Amerio, Gabriel Kreiman, Davide Zoccolan,
- Abstract summary: Stretch-and-Squeeze (SnS) is an unbiased, model-agnostic, and gradient-free framework to characterize a unit's invariance landscape.<n>SnS seeks perturbations that maximally alter the representation of a reference stimulus in a given processing stage while preserving unit activation.<n>Applying to convolutional neural networks (CNNs), SnS revealed image variations that were further from a reference image in pixel-space than those produced by affine transformations.
- Score: 9.346027495459039
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Uncovering which features' combinations high-level visual units encode is critical to understand how images are transformed into representations that support recognition. While existing feature visualization approaches typically infer a unit's most exciting images, this is insufficient to reveal the manifold of transformations under which responses remain invariant, which is key to generalization in vision. Here we introduce Stretch-and-Squeeze (SnS), an unbiased, model-agnostic, and gradient-free framework to systematically characterize a unit's invariance landscape and its vulnerability to adversarial perturbations in both biological and artificial visual systems. SnS frames these transformations as bi-objective optimization problems. To probe invariance, SnS seeks image perturbations that maximally alter the representation of a reference stimulus in a given processing stage while preserving unit activation. To probe adversarial sensitivity, SnS seeks perturbations that minimally alter the stimulus while suppressing unit activation. Applied to convolutional neural networks (CNNs), SnS revealed image variations that were further from a reference image in pixel-space than those produced by affine transformations, while more strongly preserving the target unit's response. The discovered invariant images differed dramatically depending on the choice of image representation used for optimization: pixel-level changes primarily affected luminance and contrast, while stretching mid- and late-layer CNN representations altered texture and pose respectively. Notably, the invariant images from robust networks were more recognizable by human subjects than those from standard networks, supporting the higher fidelity of robust CNNs as models of the visual system.
Related papers
- Unwarping Screen Content Images via Structure-texture Enhancement Network and Transformation Self-estimation [2.404130767806698]
We propose a structure-texture enhancement network (STEN) with transformation self-estimation for screen content images (SCIs)<n>STEN integrates a B-spline implicit neural representation module and a transformation error estimation and self-correction algorithm.<n>Experiments on public SCI datasets demonstrate that our approach significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-04-21T13:59:44Z) - Exploring Kernel Transformations for Implicit Neural Representations [57.2225355625268]
Implicit neural representations (INRs) leverage neural networks to represent signals by mapping coordinates to their corresponding attributes.<n>This work pioneers the exploration of the effect of kernel transformation of input/output while keeping the model itself unchanged.<n>A byproduct of our findings is a simple yet effective method that combines scale and shift to significantly boost INR with negligible overhead.
arXiv Detail & Related papers (2025-04-07T04:43:50Z) - PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z) - Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems.
We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN)
We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z) - Truly shift-equivariant convolutional neural networks with adaptive
polyphase upsampling [28.153820129486025]
In image classification, adaptive polyphase downsampling (APS-D) was recently proposed to make CNNs perfectly shift invariant.
We propose adaptive polyphase upsampling (APS-U), a non-linear extension of conventional upsampling, which allows CNNs to exhibit perfect shift equivariance.
arXiv Detail & Related papers (2021-05-09T22:33:53Z) - Diverse Image Inpainting with Bidirectional and Autoregressive
Transformers [55.21000775547243]
We propose BAT-Fill, an image inpainting framework with a novel bidirectional autoregressive transformer (BAT)
BAT-Fill inherits the merits of transformers and CNNs in a two-stage manner, which allows to generate high-resolution contents without being constrained by the quadratic complexity of attention in transformers.
arXiv Detail & Related papers (2021-04-26T03:52:27Z) - Feature Lenses: Plug-and-play Neural Modules for
Transformation-Invariant Visual Representations [33.02732996829386]
Convolutional Neural Networks (CNNs) are known to be brittle under various image transformations.
We propose "Feature Lenses", a set of ad-hoc modules that can be easily plugged into a trained model.
Each individual lens reconstructs the original features given the features of a transformed image under a particular transformation.
arXiv Detail & Related papers (2020-04-12T06:36:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.