Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images
- URL: http://arxiv.org/abs/2504.04225v1
- Date: Sat, 05 Apr 2025 16:25:34 GMT
- Title: Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images
- Authors: Hamza Riaz, Alan F. Smeaton,
- Abstract summary: We evaluate vision tramsformers pre-trained with masked image modelling (MIM) against synthetic out-of-distribution (OOD) benchmarks.<n>Experiments demonstrate BEIT's known robustness while maintaining 94% accuracy on PACS and 87% on Office-Home, despite significant occlusions.<n>These insights bridge the gap between lab-trained models and real-world deployment that offer a blueprint for building AI systems that generalise reliably under uncertainty.
- Score: 2.2124795371148616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern AI models excel in controlled settings but often fail in real-world scenarios where data distributions shift unpredictably - a challenge known as domain generalisation (DG). This paper tackles this limitation by rigorously evaluating vision tramsformers, specifically the BEIT architecture which is a model pre-trained with masked image modelling (MIM), against synthetic out-of-distribution (OOD) benchmarks designed to mimic real-world noise and occlusions. We introduce a novel framework to generate OOD test cases by strategically masking object regions in images using grid patterns (25\%, 50\%, 75\% occlusion) and leveraging cutting-edge zero-shot segmentation via Segment Anything and Grounding DINO to ensure precise object localisation. Experiments across three benchmarks (PACS, Office-Home, DomainNet) demonstrate BEIT's known robustness while maintaining 94\% accuracy on PACS and 87\% on Office-Home, despite significant occlusions, outperforming CNNs and other vision transformers by margins of up to 37\%. Analysis of self-attention distances reveals that the BEIT dependence on global features correlates with its resilience. Furthermore, our synthetic benchmarks expose critical failure modes: performance degrades sharply when occlusions disrupt object shapes e.g. 68\% drop for external grid masking vs. 22\% for internal masking. This work provides two key advances (1) a scalable method to generate OOD benchmarks using controllable noise, and (2) empirical evidence that MIM and self-attention mechanism in vision transformers enhance DG by learning invariant features. These insights bridge the gap between lab-trained models and real-world deployment that offer a blueprint for building AI systems that generalise reliably under uncertainty.
Related papers
- Prior2Former -- Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation [74.55677741919035]
We propose Prior2Former (P2F) as the first approach for segmentation vision transformers rooted in evidential learning.<n>P2F extends the mask vision transformer architecture by incorporating a Beta prior for computing model uncertainty in pixel-wise binary mask assignments.<n>It achieves the highest ranking in the OoDIS anomaly instance benchmark among methods not using OOD data in any way.
arXiv Detail & Related papers (2025-04-07T08:53:14Z) - ATOM: A Framework of Detecting Query-Based Model Extraction Attacks for Graph Neural Networks [18.488168353080464]
Graph Neural Networks (GNNs) have gained traction in Graph-based Machine Learning as a Service (GML) platforms, yet they remain vulnerable to graph-based model extraction attacks (MEAs)
We propose ATOM, a novel real-time MEA detection framework tailored for GNNs.
ATOM integrates sequential modeling and reinforcement learning to dynamically detect evolving attack patterns, while leveraging $k$core embedding to capture the structural properties, enhancing detection precision.
arXiv Detail & Related papers (2025-03-20T20:25:32Z) - DRIVE: Dual-Robustness via Information Variability and Entropic Consistency in Source-Free Unsupervised Domain Adaptation [10.127634263641877]
Adapting machine learning models to new domains without labeled data is a critical challenge in applications like medical imaging, autonomous driving, and remote sensing.<n>This task, known as Source-Free Unsupervised Domain Adaptation (SFUDA), involves adapting a pre-trained model to a target domain using only unlabeled target data.<n>Existing SFUDA methods often rely on single-model architectures, struggling with uncertainty and variability in the target domain.<n>We propose DRIVE, a novel SFUDA framework leveraging a dual-model architecture. The two models, with identical weights, work in parallel to capture diverse target domain characteristics.
arXiv Detail & Related papers (2024-11-24T20:35:04Z) - Accelerating Domain-Aware Electron Microscopy Analysis Using Deep Learning Models with Synthetic Data and Image-Wide Confidence Scoring [0.0]
We create a physics-based synthetic image and data generator, resulting in a machine learning model that achieves comparable precision (0.86), recall (0.63), F1 scores (0.71), and engineering property predictions (R2=0.82)
Our study demonstrates that synthetic data can eliminate human reliance in ML and provides a means for domain awareness in cases where many feature detections per image are needed.
arXiv Detail & Related papers (2024-08-02T20:15:15Z) - Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments [13.163784646113214]
Continual Test-Time Adaptation (CTTA) has recently emerged as a promising technique to gradually adapt a source-trained model to continually changing target domains.
We present AMROD, featuring three core components. Firstly, the object-level contrastive learning module extracts object-level features for contrastive learning to refine the feature representation in the target domain.
Secondly, the adaptive monitoring module dynamically skips unnecessary adaptation and updates the category-specific threshold based on predicted confidence scores to enable efficiency and improve the quality of pseudo-labels.
arXiv Detail & Related papers (2024-06-24T08:30:03Z) - Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study [61.65123150513683]
multimodal foundation models, such as CLIP, produce state-of-the-art zero-shot results.
It is reported that these models close the robustness gap by matching the performance of supervised models trained on ImageNet.
We show that CLIP leads to a significant robustness drop compared to supervised ImageNet models on our benchmark.
arXiv Detail & Related papers (2024-03-15T17:33:49Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - Robust Representation Learning with Self-Distillation for Domain Generalization [2.0817769887373245]
We propose a novel domain generalization technique called Robust Representation Learning with Self-Distillation.
We observe an average accuracy improvement in the range of 1.2% to 2.3% over the state-of-the-art on three datasets.
arXiv Detail & Related papers (2023-02-14T07:39:37Z) - From Environmental Sound Representation to Robustness of 2D CNN Models
Against Adversarial Attacks [82.21746840893658]
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
We show that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary.
arXiv Detail & Related papers (2022-04-14T15:14:08Z) - On the Robustness of Quality Measures for GANs [136.18799984346248]
This work evaluates the robustness of quality measures of generative models such as Inception Score (IS) and Fr'echet Inception Distance (FID)
We show that such metrics can also be manipulated by additive pixel perturbations.
arXiv Detail & Related papers (2022-01-31T06:43:09Z) - From Sound Representation to Model Robustness [82.21746840893658]
We investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
Averaged over various experiments on three environmental sound datasets, we found the ResNet-18 model outperforms other deep learning architectures.
arXiv Detail & Related papers (2020-07-27T17:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.