Related papers: Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal

Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal

URL: http://arxiv.org/abs/2112.03492v1
Date: Tue, 7 Dec 2021 04:46:13 GMT
Title: Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal
Authors: Yucheng Shi, Yahong Han
Abstract summary: We propose a new decision-based black-box attack against ViTs termed Patch-wise Adrial Removal (PAR) PAR records the noise magnitude and noise sensitivity of each patch and selects the patch with the highest query value for noise compression. Experiments on ImageNet-21k, ILSVRC-2012, and Tiny-Imagenet datasets demonstrate that PAR achieves a much lower magnitude of perturbation on average with the same number of queries.
Score: 42.032749850729246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision transformers (ViTs) have demonstrated impressive performance and stronger adversarial robustness compared to Deep Convolutional Neural Networks (CNNs). On the one hand, ViTs' focus on global interaction between individual patches reduces the local noise sensitivity of images. On the other hand, the existing decision-based attacks for CNNs ignore the difference in noise sensitivity between different regions of the image, which affects the efficiency of noise compression. Therefore, validating the black-box adversarial robustness of ViTs when the target model can only be queried still remains a challenging problem. In this paper, we propose a new decision-based black-box attack against ViTs termed Patch-wise Adversarial Removal (PAR). PAR divides images into patches through a coarse-to-fine search process and compresses the noise on each patch separately. PAR records the noise magnitude and noise sensitivity of each patch and selects the patch with the highest query value for noise compression. In addition, PAR can be used as a noise initialization method for other decision-based attacks to improve the noise compression efficiency on both ViTs and CNNs without introducing additional calculations. Extensive experiments on ImageNet-21k, ILSVRC-2012, and Tiny-Imagenet datasets demonstrate that PAR achieves a much lower magnitude of perturbation on average with the same number of queries.

Related papers

NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval [16.460121977322224]
Composed Image Retrieval (CIR) seeks to find a target image using a multi-modal query, which combines an image with modification text to pinpoint the target. pairs are often partially or completely mismatched due to issues like inaccurate modification texts, low-quality target images, and annotation errors. We propose the Noise-aware Contrastive Learning for CIR (NCL-CIR) comprising two key components: the Weight Compensation Block (WCB) and the Noise-pair Filter Block (NFB).
arXiv Detail & Related papers (2025-04-06T03:27:23Z)
On the Generalizability of Iterative Patch Selection for Memory-Efficient High-Resolution Image Classification [0.0]
Classifying large images with small or tiny regions of interest is challenging due to computational and memory constraints. We explore these issues using a novel testbed on a memory-efficient cross-attention transformer with Iterative Patch Selection (IPS) as the patch selection module.
arXiv Detail & Related papers (2024-12-15T16:25:30Z)
Query-Efficient Hard-Label Black-Box Attack against Vision Transformers [9.086983253339069]
Vision transformers (ViTs) face similar security risks from adversarial attacks as deep convolutional neural networks (CNNs) This article explores the vulnerability of ViTs against adversarial attacks under a black-box scenario. We propose a novel query-efficient hard-label adversarial attack method called AdvViT.
arXiv Detail & Related papers (2024-06-29T10:09:12Z)
A cross Transformer for image denoising [83.68175077524111]
We propose a cross Transformer denoising CNN (CTNet) with a serial block (SB), a parallel block (PB), and a residual block (RB) CTNet is superior to some popular denoising methods in terms of real and synthetic image denoising.
arXiv Detail & Related papers (2023-10-16T13:53:19Z)
LeNo: Adversarial Robust Salient Object Detection Networks with Learnable Noise [7.794351961083746]
This paper proposes a light-weight Learnble Noise (LeNo) to against adversarial attacks for SOD models. LeNo preserves accuracy of SOD models on both adversarial and clean images, as well as inference speed. Inspired by the center prior of human visual attention mechanism, we initialize the shallow noise with a cross-shaped gaussian distribution for better defense against adversarial attacks.
arXiv Detail & Related papers (2022-10-27T12:52:55Z)
DnSwin: Toward Real-World Denoising via Continuous Wavelet Sliding-Transformer [40.21145302686399]
We propose a continuous Wavelet Sliding-Transformer that builds frequency correspondence under real-world scenes. Specifically, we first extract the bottom features from noisy input images by using a CNN encoder. We reconstruct the deep features into denoised images using a CNN decoder.
arXiv Detail & Related papers (2022-07-28T02:33:57Z)
Optimizing Image Compression via Joint Learning with Denoising [49.83680496296047]
High levels of noise usually exist in today's captured images due to the relatively small sensors equipped in the smartphone cameras. We propose a novel two-branch, weight-sharing architecture with plug-in feature denoisers to allow a simple and effective realization of the goal with little computational cost.
arXiv Detail & Related papers (2022-07-22T04:23:01Z)
Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis [148.16279746287452]
We propose a swin-conv block to incorporate the local modeling ability of residual convolutional layer and non-local modeling ability of swin transformer block. For the training data synthesis, we design a practical noise degradation model which takes into consideration different kinds of noise. Experiments on AGWN removal and real image denoising demonstrate that the new network architecture design achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-03-24T18:11:31Z)
Exploring Inter-frequency Guidance of Image for Lightweight Gaussian Denoising [1.52292571922932]
We propose a novel network architecture denoted as IGNet, in order to refine the frequency bands from low to high in a progressive manner. With this design, more inter-frequency prior and information are utilized, thus the model size can be lightened while still perserves competitive results.
arXiv Detail & Related papers (2021-12-22T10:35:53Z)
Image Denoising using Attention-Residual Convolutional Neural Networks [0.0]
We propose a new learning-based non-blind denoising technique named Attention Residual Convolutional Neural Network (ARCNN) and its extension to blind denoising named Flexible Attention Residual Convolutional Neural Network (FARCNN) ARCNN achieved an overall average PSNR results of around 0.44dB and 0.96dB for Gaussian and Poisson denoising, respectively FARCNN presented very consistent results, even with slightly worsen performance compared to ARCNN.
arXiv Detail & Related papers (2021-01-19T16:37:57Z)
Wavelet Integrated CNNs for Noise-Robust Image Classification [51.18193090255933]
We enhance CNNs by replacing max-pooling, strided-convolution, and average-pooling with Discrete Wavelet Transform (DWT) WaveCNets, the wavelet integrated versions of VGG, ResNets, and DenseNet, achieve higher accuracy and better noise-robustness than their vanilla versions.
arXiv Detail & Related papers (2020-05-07T09:10:41Z)
Variational Denoising Network: Toward Blind Noise Modeling and Removal [59.36166491196973]
Blind image denoising is an important yet very challenging problem in computer vision. We propose a new variational inference method, which integrates both noise estimation and image denoising.
arXiv Detail & Related papers (2019-08-29T15:54:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.