Related papers: LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model

LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model

URL: http://arxiv.org/abs/2403.11656v2
Date: Wed, 27 Mar 2024 09:34:44 GMT
Title: LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model
Authors: Yuxin Cao, Jinghao Li, Xi Xiao, Derui Wang, Minhui Xue, Hao Ge, Wei Liu, Guangwu Hu,
Abstract summary: LocalStyleFool is an improved black-box video adversarial attack that superimposes regional style-transfer-based perturbations on videos. We demonstrate that LocalStyleFool can improve both intra-frame and inter-frame naturalness through a human-assessed survey.
Score: 19.37714374680383
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Previous work has shown that well-crafted adversarial perturbations can threaten the security of video recognition systems. Attackers can invade such models with a low query budget when the perturbations are semantic-invariant, such as StyleFool. Despite the query efficiency, the naturalness of the minutia areas still requires amelioration, since StyleFool leverages style transfer to all pixels in each frame. To close the gap, we propose LocalStyleFool, an improved black-box video adversarial attack that superimposes regional style-transfer-based perturbations on videos. Benefiting from the popularity and scalably usability of Segment Anything Model (SAM), we first extract different regions according to semantic information and then track them through the video stream to maintain the temporal consistency. Then, we add style-transfer-based perturbations to several regions selected based on the associative criterion of transfer-based gradient information and regional area. Perturbation fine adjustment is followed to make stylized videos adversarial. We demonstrate that LocalStyleFool can improve both intra-frame and inter-frame naturalness through a human-assessed survey, while maintaining competitive fooling rate and query efficiency. Successful experiments on the high-resolution dataset also showcase that scrupulous segmentation of SAM helps to improve the scalability of adversarial attacks under high-resolution data.

Related papers

MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on [16.0505428363005]
We propose MagicTryOn, a video virtual try-on framework built upon the large-scale video diffusion Transformer.<n>We replace the U-Net architecture with a diffusion Transformer and combine full self-attention to model the garment consistency of videos.<n>Our method outperforms existing SOTA methods in comprehensive evaluations and generalizes to in-the-wild scenarios.
arXiv Detail & Related papers (2025-05-27T15:22:02Z)
SVasP: Self-Versatility Adversarial Style Perturbation for Cross-Domain Few-Shot Learning [21.588320570295835]
Cross-Domain Few-Shot Learning aims to transfer knowledge from seen source domains to unseen target domains. Recent studies focus on utilizing visual styles to bridge the domain gap between different domains. This paper proposes a novel crop-global style method, called underlinetextbfSelf-underlinetextbfVersatility.
arXiv Detail & Related papers (2024-12-12T08:58:42Z)
UniVST: A Unified Framework for Training-free Localized Video Style Transfer [66.69471376934034]
This paper presents UniVST, a unified framework for localized video style transfer. It operates without the need for training, offering a distinct advantage over existing methods that transfer style across entire videos.
arXiv Detail & Related papers (2024-10-26T05:28:02Z)
Boosting Adversarial Transferability with Learnable Patch-wise Masks [16.46210182214551]
Adversarial examples have attracted widespread attention in security-critical applications because of their transferability across different models. In this paper, we argue that the model-specific discriminative regions are a key factor causing overfitting to the source model, and thus reducing the transferability to the target model. To accurately localize these regions, we present a learnable approach to automatically optimize the mask.
arXiv Detail & Related papers (2023-06-28T05:32:22Z)
A Unified Framework for Event-based Frame Interpolation with Ad-hoc Deblurring in the Wild [72.0226493284814]
We propose a unified framework for event-based frame that performs deblurring ad-hoc. Our network consistently outperforms previous state-of-the-art methods on frame, single image deblurring, and the joint task of both.
arXiv Detail & Related papers (2023-01-12T18:19:00Z)
Intra-Source Style Augmentation for Improved Domain Generalization [21.591831983223997]
We propose an intra-source style augmentation (ISSA) method to improve domain generalization in semantic segmentation. ISSA is model-agnostic and straightforwardly applicable with CNNs and Transformers. It is also complementary to other domain generalization techniques, e.g., it improves the recent state-of-the-art solution RobustNet by $3%$ mIoU in Cityscapes to Dark Z"urich.
arXiv Detail & Related papers (2022-10-18T21:33:25Z)
Enhancing the Self-Universality for Transferable Targeted Attacks [88.6081640779354]
Our new attack method is proposed based on the observation that highly universal adversarial perturbations tend to be more transferable for targeted attacks. Instead of optimizing the perturbations on different images, optimizing on different regions to achieve self-universality can get rid of using extra data. With the feature similarity loss, our method makes the features from adversarial perturbations to be more dominant than that of benign images.
arXiv Detail & Related papers (2022-09-08T11:21:26Z)
Adversarial Style Augmentation for Domain Generalized Urban-Scene Segmentation [120.96012935286913]
We propose a novel adversarial style augmentation approach, which can generate hard stylized images during training. Experiments on two synthetic-to-real semantic segmentation benchmarks demonstrate that AdvStyle can significantly improve the model performance on unseen real domains.
arXiv Detail & Related papers (2022-07-11T14:01:25Z)
StyleFool: Fooling Video Classification Systems via Style Transfer [28.19682215735232]
StyleFool is a black-box video adversarial attack via style transfer to fool the video classification system. StyleFool outperforms the state-of-the-art adversarial attacks in terms of the number of queries and the robustness against existing defenses.
arXiv Detail & Related papers (2022-03-30T02:18:16Z)
Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations. To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video. In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z)
Video Salient Object Detection via Contrastive Features and Attention Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection. A co-attention formulation is utilized to combine the low-level and high-level features. We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z)
Wide and Narrow: Video Prediction from Context and Motion [54.21624227408727]
We propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks. We present global context propagation networks that aggregate the non-local neighboring representations to preserve the contextual information over the past frames. We also devise local filter memory networks that generate adaptive filter kernels by storing the motion of moving objects in the memory.
arXiv Detail & Related papers (2021-10-22T04:35:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.