SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks
- URL: http://arxiv.org/abs/2406.01894v1
- Date: Tue, 4 Jun 2024 01:58:32 GMT
- Title: SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks
- Authors: Yi Pan, Jun-Jie Huang, Zihan Chen, Wentao Zhao, Ziyue Wang,
- Abstract summary: The existing adversarial attack methods mainly take a gradient-based approach and generate adversarial videos with noticeable perturbations.
We propose a novel Sparse Adversarial Attack via S-Brittany Invertible Neural Networks (VASTIN) to generate adversarial videos through imperceptible feature space information exchanging.
experiments on UCF-101 and Kinetics-400 demonstrate that our proposed SVASTIN can generate adversarial examples with higher imperceptibility than the state-of-the-art methods with the higher fooling rate.
- Score: 14.87613382899623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust and imperceptible adversarial video attack is challenging due to the spatial and temporal characteristics of videos. The existing video adversarial attack methods mainly take a gradient-based approach and generate adversarial videos with noticeable perturbations. In this paper, we propose a novel Sparse Adversarial Video Attack via Spatio-Temporal Invertible Neural Networks (SVASTIN) to generate adversarial videos through spatio-temporal feature space information exchanging. It consists of a Guided Target Video Learning (GTVL) module to balance the perturbation budget and optimization speed and a Spatio-Temporal Invertible Neural Network (STIN) module to perform spatio-temporal feature space information exchanging between a source video and the target feature tensor learned by GTVL module. Extensive experiments on UCF-101 and Kinetics-400 demonstrate that our proposed SVASTIN can generate adversarial examples with higher imperceptibility than the state-of-the-art methods with the higher fooling rate. Code is available at \href{https://github.com/Brittany-Chen/SVASTIN}{https://github.com/Brittany-Chen/SVASTIN}.
Related papers
- Video to Video Generative Adversarial Network for Few-shot Learning Based on Policy Gradient [12.07088416665005]
We propose RL-V2V-GAN, a new deep neural network approach for conditional conditional-to-video synthesis.
While preserving the style of the source video domain, our approach aims to learn a gradient mapping from a source video domain to a target video domain.
Our experiments show that RL-V2V-GAN can produce temporally coherent video results.
arXiv Detail & Related papers (2024-10-28T01:35:10Z) - ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack [71.2286719703198]
We propose the Recursive Token Merging for Video Diffusion-based Unrestricted Adrial Attack (ReToMe-VA)
To achieve spatial imperceptibility, ReToMe-VA adopts a Timestep-wise Adrial Latent Optimization (TALO) strategy.
To achieve temporal imperceptibility, ReToMe-VA introduces a Recursive Token Merging (ReToMe) mechanism by matching and merging tokens across video frames.
arXiv Detail & Related papers (2024-08-10T08:10:30Z) - ASF-Net: Robust Video Deraining via Temporal Alignment and Online
Adaptive Learning [47.10392889695035]
We propose a new computational paradigm, Alignment-Shift-Fusion Network (ASF-Net), which incorporates a temporal shift module.
We construct a LArge-scale RAiny video dataset (LARA) which supports the development of this community.
Our proposed approach exhibits superior performance in three benchmarks and compelling visual quality in real-world scenarios.
arXiv Detail & Related papers (2023-09-02T14:50:13Z) - Video Event Restoration Based on Keyframes for Video Anomaly Detection [9.18057851239942]
Existing deep neural network based anomaly detection (VAD) methods mostly follow the route of frame reconstruction or frame prediction.
We introduce a brand-new VAD paradigm to break through these limitations.
We propose a novel U-shaped Swin Transformer Network with Dual Skip Connections (USTN-DSC) for video event restoration.
arXiv Detail & Related papers (2023-04-11T10:13:19Z) - Generating Videos with Dynamics-aware Implicit Generative Adversarial
Networks [68.93429034530077]
We propose dynamics-aware implicit generative adversarial network (DIGAN) for video generation.
We show that DIGAN can be trained on 128 frame videos of 128x128 resolution, 80 frames longer than the 48 frames of the previous state-of-the-art method.
arXiv Detail & Related papers (2022-02-21T23:24:01Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Attacking Video Recognition Models with Bullet-Screen Comments [79.53159486470858]
We introduce a novel adversarial attack, which attacks video recognition models with bullet-screen comment (BSC) attacks.
BSCs can be regarded as a kind of meaningful patch, adding it to a clean video will not affect people' s understanding of the video content, nor will arouse people' s suspicion.
arXiv Detail & Related papers (2021-10-29T08:55:50Z) - Spatiotemporal Inconsistency Learning for DeepFake Video Detection [51.747219106855624]
We present a novel temporal modeling paradigm in TIM by exploiting the temporal difference over adjacent frames along with both horizontal and vertical directions.
And the ISM simultaneously utilizes the spatial information from SIM and temporal information from TIM to establish a more comprehensive spatial-temporal representation.
arXiv Detail & Related papers (2021-09-04T13:05:37Z) - Over-the-Air Adversarial Flickering Attacks against Video Recognition
Networks [54.82488484053263]
Deep neural networks for video classification may be subjected to adversarial manipulation.
We present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation.
The attack was implemented on several target models and the transferability of the attack was demonstrated.
arXiv Detail & Related papers (2020-02-12T17:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.