Related papers: Breaking Temporal Consistency: Generating Video Universal Adversarial Perturbations Using Image Models

Breaking Temporal Consistency: Generating Video Universal Adversarial Perturbations Using Image Models

URL: http://arxiv.org/abs/2311.10366v1
Date: Fri, 17 Nov 2023 07:39:42 GMT
Title: Breaking Temporal Consistency: Generating Video Universal Adversarial Perturbations Using Image Models
Authors: Hee-Seon Kim, Minji Son, Minbeom Kim, Myung-Joon Kwon, Changick Kim
Abstract summary: We introduce the Breaking Temporal Consistency (BTC) method, which is the first attempt to incorporate temporal information into video attacks using image models. Our approach is simple but effective at attacking unseen video models. Our approach surpasses existing methods in terms of effectiveness on various datasets.
Score: 16.36416048893487
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As video analysis using deep learning models becomes more widespread, the vulnerability of such models to adversarial attacks is becoming a pressing concern. In particular, Universal Adversarial Perturbation (UAP) poses a significant threat, as a single perturbation can mislead deep learning models on entire datasets. We propose a novel video UAP using image data and image model. This enables us to take advantage of the rich image data and image model-based studies available for video applications. However, there is a challenge that image models are limited in their ability to analyze the temporal aspects of videos, which is crucial for a successful video attack. To address this challenge, we introduce the Breaking Temporal Consistency (BTC) method, which is the first attempt to incorporate temporal information into video attacks using image models. We aim to generate adversarial videos that have opposite patterns to the original. Specifically, BTC-UAP minimizes the feature similarity between neighboring frames in videos. Our approach is simple but effective at attacking unseen video models. Additionally, it is applicable to videos of varying lengths and invariant to temporal shifts. Our approach surpasses existing methods in terms of effectiveness on various datasets, including ImageNet, UCF-101, and Kinetics-400.

Related papers

Robustness Evaluation for Video Models with Reinforcement Learning [4.0196072781228285]
We propose a multi-agent reinforcement learning approach that learns cooperatively to identify the given video's sensitive spatial and temporal regions.<n>Our method outperforms the state-of-the-art solutions on the Lp metric and the average queries.
arXiv Detail & Related papers (2025-06-05T08:38:09Z)
Subject-driven Video Generation via Disentangled Identity and Motion [52.54835936914813]
We propose to train a subject-driven customized video generation model through decoupling the subject-specific learning from temporal dynamics in zero-shot without additional tuning. Our method achieves strong subject consistency and scalability, outperforming existing video customization models in zero-shot settings.
arXiv Detail & Related papers (2025-04-23T06:48:31Z)
Fine-gained Zero-shot Video Sampling [21.42513407755273]
We propose a novel Zero-Shot video sampling algorithm, denoted as $mathcalZS2$. $mathcalZS2$ is capable of directly sampling high-quality video clips without any training or optimization. It achieves state-of-the-art performance in zero-shot video generation, occasionally outperforming recent supervised methods.
arXiv Detail & Related papers (2024-07-31T09:36:58Z)
WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions. We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion. Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z)
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner. We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules. Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z)
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models [40.73982918337828]
We propose a training-free general-purpose video synthesis framework, coined as bf BIVDiff, via bridging specific image diffusion models and general text-to-video foundation diffusion models. Specifically, we first use a specific image diffusion model (e.g., ControlNet and Instruct Pix2Pix) for frame-wise video generation, then perform Mixed Inversion on the generated video, and finally input the inverted latents into the video diffusion models.
arXiv Detail & Related papers (2023-12-05T14:56:55Z)
DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance [69.0740091741732]
We propose a high-fidelity image-to-video generation method by devising a frame retention branch based on a pre-trained video diffusion model, named DreamVideo. Our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.
arXiv Detail & Related papers (2023-12-05T03:16:31Z)
ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models [99.84195819571411]
ART$boldsymbolcdot$V is an efficient framework for auto-regressive video generation with diffusion models. It only learns simple continual motions between adjacent frames. It can generate arbitrarily long videos conditioned on a variety of prompts.
arXiv Detail & Related papers (2023-11-30T18:59:47Z)
Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras. We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z)
Boosting the Transferability of Video Adversarial Examples via Temporal Translation [82.0745476838865]
adversarial examples are transferable, which makes them feasible for black-box attacks in real-world applications. We introduce a temporal translation attack method, which optimize the adversarial perturbations over a set of temporal translated video clips. Experiments on the Kinetics-400 dataset and the UCF-101 dataset demonstrate that our method can significantly boost the transferability of video adversarial examples.
arXiv Detail & Related papers (2021-10-18T07:52:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.