VideoPure: Diffusion-based Adversarial Purification for Video Recognition
- URL: http://arxiv.org/abs/2501.14999v1
- Date: Sat, 25 Jan 2025 00:24:51 GMT
- Title: VideoPure: Diffusion-based Adversarial Purification for Video Recognition
- Authors: Kaixun Jiang, Zhaoyu Chen, Jiyuan Fu, Lingyi Hong, Jinglun Li, Wenqiang Zhang,
- Abstract summary: We propose the first diffusion-based video purification framework to improve video recognition models' adversarial robustness: VideoPure.
We employ temporal DDIM inversion to transform the input distribution into a temporally consistent and trajectory-defined distribution, covering adversarial noise while preserving more video structure.
We investigate the defense performance of our method against black-box, gray-box, and adaptive attacks on benchmark datasets and models.
- Score: 21.317424798634086
- License:
- Abstract: Recent work indicates that video recognition models are vulnerable to adversarial examples, posing a serious security risk to downstream applications. However, current research has primarily focused on adversarial attacks, with limited work exploring defense mechanisms. Furthermore, due to the spatial-temporal complexity of videos, existing video defense methods face issues of high cost, overfitting, and limited defense performance. Recently, diffusion-based adversarial purification methods have achieved robust defense performance in the image domain. However, due to the additional temporal dimension in videos, directly applying these diffusion-based adversarial purification methods to the video domain suffers performance and efficiency degradation. To achieve an efficient and effective video adversarial defense method, we propose the first diffusion-based video purification framework to improve video recognition models' adversarial robustness: VideoPure. Given an adversarial example, we first employ temporal DDIM inversion to transform the input distribution into a temporally consistent and trajectory-defined distribution, covering adversarial noise while preserving more video structure. Then, during DDIM denoising, we leverage intermediate results at each denoising step and conduct guided spatial-temporal optimization, removing adversarial noise while maintaining temporal consistency. Finally, we input the list of optimized intermediate results into the video recognition model for multi-step voting to obtain the predicted class. We investigate the defense performance of our method against black-box, gray-box, and adaptive attacks on benchmark datasets and models. Compared with other adversarial purification methods, our method overall demonstrates better defense performance against different attacks. Our code is available at https://github.com/deep-kaixun/VideoPure.
Related papers
- DiffuseDef: Improved Robustness to Adversarial Attacks [38.34642687239535]
adversarial attacks pose a critical challenge to system built using pretrained language models.
We propose DiffuseDef, which incorporates a diffusion layer as a denoiser between the encoder and the classifier.
During inference, the adversarial hidden state is first combined with sampled noise, then denoised iteratively and finally ensembled to produce a robust text representation.
arXiv Detail & Related papers (2024-06-28T22:36:17Z) - Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos [16.34393937800271]
generative models in creating high-quality videos have raised concerns about digital integrity and privacy vulnerabilities.
Recent works to combat Deepfakes videos have developed detectors that are highly accurate at identifying GAN-generated samples.
We propose a novel framework for detecting videos synthesized from multiple state-of-the-art (SOTA) generative models.
arXiv Detail & Related papers (2024-06-13T21:52:49Z) - Inter-frame Accelerate Attack against Video Interpolation Models [73.28751441626754]
We apply adversarial attacks to VIF models and find that the VIF models are very vulnerable to adversarial examples.
We propose a novel attack method named Inter-frame Accelerate Attack (IAA) thats the iterations as the perturbation for the previous adjacent frame.
It is shown that our method can improve attack efficiency greatly while achieving comparable attack performance with traditional methods.
arXiv Detail & Related papers (2023-05-11T03:08:48Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z) - Temporal Shuffling for Defending Deep Action Recognition Models against
Adversarial Attacks [67.58887471137436]
We develop a novel defense method using temporal shuffling of input videos against adversarial attacks for action recognition models.
To the best of our knowledge, this is the first attempt to design a defense method without additional training for 3D CNN-based video action recognition models.
arXiv Detail & Related papers (2021-12-15T06:57:01Z) - Robust Tracking against Adversarial Attacks [69.59717023941126]
We first attempt to generate adversarial examples on top of video sequences to improve the tracking robustness against adversarial attacks.
We apply the proposed adversarial attack and defense approaches to state-of-the-art deep tracking algorithms.
arXiv Detail & Related papers (2020-07-20T08:05:55Z) - Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior [63.11478060678794]
We propose an effective motion-excited sampler to obtain motion-aware noise prior.
By using the sparked prior in gradient estimation, we can successfully attack a variety of video classification models with fewer number of queries.
arXiv Detail & Related papers (2020-03-17T10:54:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.