Decision-based Black-box Attack Against Vision Transformers via
Patch-wise Adversarial Removal
- URL: http://arxiv.org/abs/2112.03492v1
- Date: Tue, 7 Dec 2021 04:46:13 GMT
- Title: Decision-based Black-box Attack Against Vision Transformers via
Patch-wise Adversarial Removal
- Authors: Yucheng Shi, Yahong Han
- Abstract summary: We propose a new decision-based black-box attack against ViTs termed Patch-wise Adrial Removal (PAR)
PAR records the noise magnitude and noise sensitivity of each patch and selects the patch with the highest query value for noise compression.
Experiments on ImageNet-21k, ILSVRC-2012, and Tiny-Imagenet datasets demonstrate that PAR achieves a much lower magnitude of perturbation on average with the same number of queries.
- Score: 42.032749850729246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision transformers (ViTs) have demonstrated impressive performance and
stronger adversarial robustness compared to Deep Convolutional Neural Networks
(CNNs). On the one hand, ViTs' focus on global interaction between individual
patches reduces the local noise sensitivity of images. On the other hand, the
existing decision-based attacks for CNNs ignore the difference in noise
sensitivity between different regions of the image, which affects the
efficiency of noise compression. Therefore, validating the black-box
adversarial robustness of ViTs when the target model can only be queried still
remains a challenging problem. In this paper, we propose a new decision-based
black-box attack against ViTs termed Patch-wise Adversarial Removal (PAR). PAR
divides images into patches through a coarse-to-fine search process and
compresses the noise on each patch separately. PAR records the noise magnitude
and noise sensitivity of each patch and selects the patch with the highest
query value for noise compression. In addition, PAR can be used as a noise
initialization method for other decision-based attacks to improve the noise
compression efficiency on both ViTs and CNNs without introducing additional
calculations. Extensive experiments on ImageNet-21k, ILSVRC-2012, and
Tiny-Imagenet datasets demonstrate that PAR achieves a much lower magnitude of
perturbation on average with the same number of queries.
Related papers
- NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval [16.460121977322224]
Composed Image Retrieval (CIR) seeks to find a target image using a multi-modal query, which combines an image with modification text to pinpoint the target.
pairs are often partially or completely mismatched due to issues like inaccurate modification texts, low-quality target images, and annotation errors.
We propose the Noise-aware Contrastive Learning for CIR (NCL-CIR) comprising two key components: the Weight Compensation Block (WCB) and the Noise-pair Filter Block (NFB).
arXiv Detail & Related papers (2025-04-06T03:27:23Z) - On the Generalizability of Iterative Patch Selection for Memory-Efficient High-Resolution Image Classification [0.0]
Classifying large images with small or tiny regions of interest is challenging due to computational and memory constraints.
We explore these issues using a novel testbed on a memory-efficient cross-attention transformer with Iterative Patch Selection (IPS) as the patch selection module.
arXiv Detail & Related papers (2024-12-15T16:25:30Z) - Query-Efficient Hard-Label Black-Box Attack against Vision Transformers [9.086983253339069]
Vision transformers (ViTs) face similar security risks from adversarial attacks as deep convolutional neural networks (CNNs)
This article explores the vulnerability of ViTs against adversarial attacks under a black-box scenario.
We propose a novel query-efficient hard-label adversarial attack method called AdvViT.
arXiv Detail & Related papers (2024-06-29T10:09:12Z) - A cross Transformer for image denoising [83.68175077524111]
We propose a cross Transformer denoising CNN (CTNet) with a serial block (SB), a parallel block (PB), and a residual block (RB)
CTNet is superior to some popular denoising methods in terms of real and synthetic image denoising.
arXiv Detail & Related papers (2023-10-16T13:53:19Z) - LeNo: Adversarial Robust Salient Object Detection Networks with
Learnable Noise [7.794351961083746]
This paper proposes a light-weight Learnble Noise (LeNo) to against adversarial attacks for SOD models.
LeNo preserves accuracy of SOD models on both adversarial and clean images, as well as inference speed.
Inspired by the center prior of human visual attention mechanism, we initialize the shallow noise with a cross-shaped gaussian distribution for better defense against adversarial attacks.
arXiv Detail & Related papers (2022-10-27T12:52:55Z) - DnSwin: Toward Real-World Denoising via Continuous Wavelet
Sliding-Transformer [40.21145302686399]
We propose a continuous Wavelet Sliding-Transformer that builds frequency correspondence under real-world scenes.
Specifically, we first extract the bottom features from noisy input images by using a CNN encoder.
We reconstruct the deep features into denoised images using a CNN decoder.
arXiv Detail & Related papers (2022-07-28T02:33:57Z) - Optimizing Image Compression via Joint Learning with Denoising [49.83680496296047]
High levels of noise usually exist in today's captured images due to the relatively small sensors equipped in the smartphone cameras.
We propose a novel two-branch, weight-sharing architecture with plug-in feature denoisers to allow a simple and effective realization of the goal with little computational cost.
arXiv Detail & Related papers (2022-07-22T04:23:01Z) - Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis [148.16279746287452]
We propose a swin-conv block to incorporate the local modeling ability of residual convolutional layer and non-local modeling ability of swin transformer block.
For the training data synthesis, we design a practical noise degradation model which takes into consideration different kinds of noise.
Experiments on AGWN removal and real image denoising demonstrate that the new network architecture design achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-03-24T18:11:31Z) - Exploring Inter-frequency Guidance of Image for Lightweight Gaussian
Denoising [1.52292571922932]
We propose a novel network architecture denoted as IGNet, in order to refine the frequency bands from low to high in a progressive manner.
With this design, more inter-frequency prior and information are utilized, thus the model size can be lightened while still perserves competitive results.
arXiv Detail & Related papers (2021-12-22T10:35:53Z) - Image Denoising using Attention-Residual Convolutional Neural Networks [0.0]
We propose a new learning-based non-blind denoising technique named Attention Residual Convolutional Neural Network (ARCNN) and its extension to blind denoising named Flexible Attention Residual Convolutional Neural Network (FARCNN)
ARCNN achieved an overall average PSNR results of around 0.44dB and 0.96dB for Gaussian and Poisson denoising, respectively FARCNN presented very consistent results, even with slightly worsen performance compared to ARCNN.
arXiv Detail & Related papers (2021-01-19T16:37:57Z) - Wavelet Integrated CNNs for Noise-Robust Image Classification [51.18193090255933]
We enhance CNNs by replacing max-pooling, strided-convolution, and average-pooling with Discrete Wavelet Transform (DWT)
WaveCNets, the wavelet integrated versions of VGG, ResNets, and DenseNet, achieve higher accuracy and better noise-robustness than their vanilla versions.
arXiv Detail & Related papers (2020-05-07T09:10:41Z) - Variational Denoising Network: Toward Blind Noise Modeling and Removal [59.36166491196973]
Blind image denoising is an important yet very challenging problem in computer vision.
We propose a new variational inference method, which integrates both noise estimation and image denoising.
arXiv Detail & Related papers (2019-08-29T15:54:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.