LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model
- URL: http://arxiv.org/abs/2403.11656v2
- Date: Wed, 27 Mar 2024 09:34:44 GMT
- Title: LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model
- Authors: Yuxin Cao, Jinghao Li, Xi Xiao, Derui Wang, Minhui Xue, Hao Ge, Wei Liu, Guangwu Hu,
- Abstract summary: LocalStyleFool is an improved black-box video adversarial attack that superimposes regional style-transfer-based perturbations on videos.
We demonstrate that LocalStyleFool can improve both intra-frame and inter-frame naturalness through a human-assessed survey.
- Score: 19.37714374680383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous work has shown that well-crafted adversarial perturbations can threaten the security of video recognition systems. Attackers can invade such models with a low query budget when the perturbations are semantic-invariant, such as StyleFool. Despite the query efficiency, the naturalness of the minutia areas still requires amelioration, since StyleFool leverages style transfer to all pixels in each frame. To close the gap, we propose LocalStyleFool, an improved black-box video adversarial attack that superimposes regional style-transfer-based perturbations on videos. Benefiting from the popularity and scalably usability of Segment Anything Model (SAM), we first extract different regions according to semantic information and then track them through the video stream to maintain the temporal consistency. Then, we add style-transfer-based perturbations to several regions selected based on the associative criterion of transfer-based gradient information and regional area. Perturbation fine adjustment is followed to make stylized videos adversarial. We demonstrate that LocalStyleFool can improve both intra-frame and inter-frame naturalness through a human-assessed survey, while maintaining competitive fooling rate and query efficiency. Successful experiments on the high-resolution dataset also showcase that scrupulous segmentation of SAM helps to improve the scalability of adversarial attacks under high-resolution data.
Related papers
- UniVST: A Unified Framework for Training-free Localized Video Style Transfer [66.69471376934034]
This paper presents UniVST, a unified framework for localized video style transfer.
It operates without the need for training, offering a distinct advantage over existing methods that transfer style across entire videos.
arXiv Detail & Related papers (2024-10-26T05:28:02Z) - Boosting Adversarial Transferability with Learnable Patch-wise Masks [16.46210182214551]
Adversarial examples have attracted widespread attention in security-critical applications because of their transferability across different models.
In this paper, we argue that the model-specific discriminative regions are a key factor causing overfitting to the source model, and thus reducing the transferability to the target model.
To accurately localize these regions, we present a learnable approach to automatically optimize the mask.
arXiv Detail & Related papers (2023-06-28T05:32:22Z) - Intra-Source Style Augmentation for Improved Domain Generalization [21.591831983223997]
We propose an intra-source style augmentation (ISSA) method to improve domain generalization in semantic segmentation.
ISSA is model-agnostic and straightforwardly applicable with CNNs and Transformers.
It is also complementary to other domain generalization techniques, e.g., it improves the recent state-of-the-art solution RobustNet by $3%$ mIoU in Cityscapes to Dark Z"urich.
arXiv Detail & Related papers (2022-10-18T21:33:25Z) - Enhancing the Self-Universality for Transferable Targeted Attacks [88.6081640779354]
Our new attack method is proposed based on the observation that highly universal adversarial perturbations tend to be more transferable for targeted attacks.
Instead of optimizing the perturbations on different images, optimizing on different regions to achieve self-universality can get rid of using extra data.
With the feature similarity loss, our method makes the features from adversarial perturbations to be more dominant than that of benign images.
arXiv Detail & Related papers (2022-09-08T11:21:26Z) - Adversarial Style Augmentation for Domain Generalized Urban-Scene
Segmentation [120.96012935286913]
We propose a novel adversarial style augmentation approach, which can generate hard stylized images during training.
Experiments on two synthetic-to-real semantic segmentation benchmarks demonstrate that AdvStyle can significantly improve the model performance on unseen real domains.
arXiv Detail & Related papers (2022-07-11T14:01:25Z) - StyleFool: Fooling Video Classification Systems via Style Transfer [28.19682215735232]
StyleFool is a black-box video adversarial attack via style transfer to fool the video classification system.
StyleFool outperforms the state-of-the-art adversarial attacks in terms of the number of queries and the robustness against existing defenses.
arXiv Detail & Related papers (2022-03-30T02:18:16Z) - Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations.
To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video.
In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Wide and Narrow: Video Prediction from Context and Motion [54.21624227408727]
We propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks.
We present global context propagation networks that aggregate the non-local neighboring representations to preserve the contextual information over the past frames.
We also devise local filter memory networks that generate adaptive filter kernels by storing the motion of moving objects in the memory.
arXiv Detail & Related papers (2021-10-22T04:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.