On the unreasonable vulnerability of transformers for image restoration
-- and an easy fix
- URL: http://arxiv.org/abs/2307.13856v1
- Date: Tue, 25 Jul 2023 23:09:05 GMT
- Title: On the unreasonable vulnerability of transformers for image restoration
-- and an easy fix
- Authors: Shashank Agnihotri, Kanchana Vaishnavi Gandikota, Julia Grabinski,
Paramanand Chandramouli, Margret Keuper
- Abstract summary: We investigate whether the improved adversarial robustness of ViTs extends to image restoration.
We consider the recently proposed Restormer model, as well as NAFNet and the "Baseline network"
Our experiments are performed on real-world images from the GoPro dataset for image deblurring.
- Score: 16.927916090724363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Following their success in visual recognition tasks, Vision
Transformers(ViTs) are being increasingly employed for image restoration. As a
few recent works claim that ViTs for image classification also have better
robustness properties, we investigate whether the improved adversarial
robustness of ViTs extends to image restoration. We consider the recently
proposed Restormer model, as well as NAFNet and the "Baseline network" which
are both simplified versions of a Restormer. We use Projected Gradient Descent
(PGD) and CosPGD, a recently proposed adversarial attack tailored to pixel-wise
prediction tasks for our robustness evaluation. Our experiments are performed
on real-world images from the GoPro dataset for image deblurring. Our analysis
indicates that contrary to as advocated by ViTs in image classification works,
these models are highly susceptible to adversarial attacks. We attempt to
improve their robustness through adversarial training. While this yields a
significant increase in robustness for Restormer, results on other networks are
less promising. Interestingly, the design choices in NAFNet and Baselines,
which were based on iid performance, and not on robust generalization, seem to
be at odds with the model robustness. Thus, we investigate this further and
find a fix.
Related papers
- Towards Robust Image Stitching: An Adaptive Resistance Learning against
Compatible Attacks [66.98297584796391]
Image stitching seamlessly integrates images captured from varying perspectives into a single wide field-of-view image.
Given a pair of captured images, subtle perturbations and distortions which go unnoticed by the human visual system tend to attack the correspondence matching.
This paper presents the first attempt to improve the robustness of image stitching against adversarial attacks.
arXiv Detail & Related papers (2024-02-25T02:36:33Z) - Image Reconstruction using Enhanced Vision Transformer [0.08594140167290097]
We propose a novel image reconstruction framework which can be used for tasks such as image denoising, deblurring or inpainting.
The model proposed in this project is based on Vision Transformer (ViT) that takes 2D images as input and outputs embeddings.
We incorporate four additional optimization techniques in the framework to improve the model reconstruction capability.
arXiv Detail & Related papers (2023-07-11T02:14:18Z) - Image Deblurring by Exploring In-depth Properties of Transformer [86.7039249037193]
We leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics.
By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information.
One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space.
arXiv Detail & Related papers (2023-03-24T14:14:25Z) - Revisiting Adversarial Training for ImageNet: Architectures, Training
and Generalization across Threat Models [52.86163536826919]
We revisit adversarial training on ImageNet comparing ViTs and ConvNeXts.
Our modified ConvNeXt, ConvNeXt + ConvStem, yields the most robust generalizations across different ranges of model parameters.
Our ViT + ConvStem yields the best generalization to unseen threat models.
arXiv Detail & Related papers (2023-03-03T11:53:01Z) - Adversarially-Aware Robust Object Detector [85.10894272034135]
We propose a Robust Detector (RobustDet) based on adversarially-aware convolution to disentangle gradients for model learning on clean and adversarial images.
Our model effectively disentangles gradients and significantly enhances the detection robustness with maintaining the detection ability on clean images.
arXiv Detail & Related papers (2022-07-13T13:59:59Z) - The Principle of Diversity: Training Stronger Vision Transformers Calls
for Reducing All Levels of Redundancy [111.49944789602884]
This paper systematically studies the ubiquitous existence of redundancy at all three levels: patch embedding, attention map, and weight space.
We propose corresponding regularizers that encourage the representation diversity and coverage at each of those levels, that enabling capturing more discriminative information.
arXiv Detail & Related papers (2022-03-12T04:48:12Z) - Can't Fool Me: Adversarially Robust Transformer for Video Understanding [8.082788827336337]
In video understanding tasks, developing adversarially robust models is still unexplored.
We first show that simple extensions of image based adversarially robust models slightly improve the worst-case performance.
We illustrate using a large-scale video data set YouTube-8M that the final model achieves close to non-adversarial performance.
arXiv Detail & Related papers (2021-10-26T18:30:21Z) - Understanding and Improving Robustness of Vision Transformers through
Patch-based Negative Augmentation [29.08732248577141]
We investigate the robustness of vision transformers (ViTs) through the lens of their special patch-based architectural structure.
We find that ViTs are surprisingly insensitive to patch-based transformations, even when the transformation largely destroys the original semantics.
We show that patch-based negative augmentation consistently improves robustness of ViTs across a wide set of ImageNet based robustness benchmarks.
arXiv Detail & Related papers (2021-10-15T04:53:18Z) - Understanding Robustness of Transformers for Image Classification [34.51672491103555]
Vision Transformer (ViT) has surpassed ResNets for image classification.
Details of the Transformer architecture lead one to wonder whether these networks are as robust.
We find that ViT models are at least as robust as the ResNet counterparts on a broad range of perturbations.
arXiv Detail & Related papers (2021-03-26T16:47:55Z) - Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp
Adversarial Attacks [154.31827097264264]
Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms.
We propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model.
Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks.
arXiv Detail & Related papers (2020-09-05T06:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.