Related papers: Detecting Regional Spurious Correlations in Vision Transformers via Token Discarding

Detecting Regional Spurious Correlations in Vision Transformers via Token Discarding

URL: http://arxiv.org/abs/2509.04009v1
Date: Thu, 04 Sep 2025 08:40:40 GMT
Title: Detecting Regional Spurious Correlations in Vision Transformers via Token Discarding
Authors: Solha Kang, Esla Timothy Anzaku, Wesley De Neve, Arnout Van Messem, Joris Vankerschaver, Francois Rameau, Utku Ozbulak,
Abstract summary: We present a novel method to detect spurious correlations in vision transformers.<n>We also present a case study investigating spurious signals in invasive breast mass classification.
Score: 0.7315240103690552
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to their powerful feature association capabilities, neural network-based computer vision models have the ability to detect and exploit unintended patterns within the data, potentially leading to correct predictions based on incorrect or unintended but statistically relevant signals. These clues may vary from simple color aberrations to small texts within the image. In situations where these unintended signals align with the predictive task, models can mistakenly link these features with the task and rely on them for making predictions. This phenomenon is referred to as spurious correlations, where patterns appear to be associated with the task but are actually coincidental. As a result, detection and mitigation of spurious correlations have become crucial tasks for building trustworthy, reliable, and generalizable machine learning models. In this work, we present a novel method to detect spurious correlations in vision transformers, a type of neural network architecture that gained significant popularity in recent years. Using both supervised and self-supervised trained models, we present large-scale experiments on the ImageNet dataset demonstrating the ability of the proposed method to identify spurious correlations. We also find that, even if the same architecture is used, the training methodology has a significant impact on the model's reliance on spurious correlations. Furthermore, we show that certain classes in the ImageNet dataset contain spurious signals that are easily detected by the models and discuss the underlying reasons for those spurious signals. In light of our findings, we provide an exhaustive list of the aforementioned images and call for caution in their use in future research efforts. Lastly, we present a case study investigating spurious signals in invasive breast mass classification, grounding our work in real-world scenarios.

Related papers

Severing Spurious Correlations with Data Pruning [2.93774265594295]
Deep neural networks have been shown to learn and rely on spurious correlations present in the data that they are trained on.<n>Such correlations can cause these networks to malfunction when deployed in the real world, where these correlations may no longer hold.<n>We develop a novel data pruning technique that identifies and prunes small subsets of the training data that contain these samples.
arXiv Detail & Related papers (2025-03-24T00:57:32Z)
Detecting Spurious Correlations via Robust Visual Concepts in Real and AI-Generated Image Classification [12.992095539058022]
We introduce a general-purpose method that efficiently detects potential spurious correlations. The proposed method provides intuitive explanations while eliminating the need for pixel-level annotations. Our method is also suitable for detecting spurious correlations that may propagate to downstream applications originating from generative models.
arXiv Detail & Related papers (2023-11-03T01:12:35Z)
On the Computational Entanglement of Distant Features in Adversarial Machine Learning [8.87656044562629]
We introduce the concept of "computational entanglement" Computational entanglement enables the network to achieve zero loss by fitting random noise, even on previously unseen test samples. We present a novel application of computational entanglement in transforming a worst-case adversarial examples-inputs that are highly non-robust.
arXiv Detail & Related papers (2023-09-27T14:09:15Z)
Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data. We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations. Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z)
Causal Analysis for Robust Interpretability of Neural Networks [0.2519906683279152]
We develop a robust interventional-based method to capture cause-effect mechanisms in pre-trained neural networks. We apply our method to vision models trained on classification tasks.
arXiv Detail & Related papers (2023-05-15T18:37:24Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Anomaly Detection on Attributed Networks via Contrastive Self-Supervised Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks. Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair. A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z)
Learning Causal Models Online [103.87959747047158]
Predictive models can rely on spurious correlations in the data for making predictions. One solution for achieving strong generalization is to incorporate causal structures in the models. We propose an online algorithm that continually detects and removes spurious features.
arXiv Detail & Related papers (2020-06-12T20:49:20Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.