VGMShield: Mitigating Misuse of Video Generative Models
- URL: http://arxiv.org/abs/2402.13126v1
- Date: Tue, 20 Feb 2024 16:39:23 GMT
- Title: VGMShield: Mitigating Misuse of Video Generative Models
- Authors: Yan Pang, Yang Zhang, Tianhao Wang
- Abstract summary: We introduce VGMShield: a set of three straightforward but pioneering mitigations through the lifecycle of fake video generation.
We first try to understand whether there is uniqueness in generated videos and whether we can differentiate them from real videos.
Then, we investigate the textittracing problem, which maps a fake video back to a model that generates it.
- Score: 7.963591895964269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid advancement in video generation, people can conveniently
utilize video generation models to create videos tailored to their specific
desires. Nevertheless, there are also growing concerns about their potential
misuse in creating and disseminating false information.
In this work, we introduce VGMShield: a set of three straightforward but
pioneering mitigations through the lifecycle of fake video generation. We start
from \textit{fake video detection} trying to understand whether there is
uniqueness in generated videos and whether we can differentiate them from real
videos; then, we investigate the \textit{tracing} problem, which maps a fake
video back to a model that generates it. Towards these, we propose to leverage
pre-trained models that focus on {\it spatial-temporal dynamics} as the
backbone to identify inconsistencies in videos. Through experiments on seven
state-of-the-art open-source models, we demonstrate that current models still
cannot perfectly handle spatial-temporal relationships, and thus, we can
accomplish detection and tracing with nearly perfect accuracy.
Furthermore, anticipating future generative model improvements, we propose a
{\it prevention} method that adds invisible perturbations to images to make the
generated videos look unreal. Together with fake video detection and tracing,
our multi-faceted set of solutions can effectively mitigate misuse of video
generative models.
Related papers
- WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos.
Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions.
We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion.
Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z) - SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix [60.48666051245761]
We propose a pose-free and training-free approach for generating 3D stereoscopic videos.
Our method warps a generated monocular video into camera views on stereoscopic baseline using estimated video depth.
We develop a disocclusion boundary re-injection scheme that further improves the quality of video inpainting.
arXiv Detail & Related papers (2024-06-29T08:33:55Z) - What Matters in Detecting AI-Generated Videos like Sora? [51.05034165599385]
Gap between synthetic and real-world videos remains under-explored.
In this study, we compare real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion.
Our model is capable of detecting videos generated by Sora with high accuracy, even without exposure to any Sora videos during training.
arXiv Detail & Related papers (2024-06-27T23:03:58Z) - 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models [53.89348957053395]
We introduce a novel pipeline designed for text-to-4D scene generation.
Our method begins by generating a reference video using the video generation model.
We then learn the canonical 3D representation of the video using a freeze-time video.
arXiv Detail & Related papers (2024-06-11T17:19:26Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - DeCoF: Generated Video Detection via Frame Consistency: The First Benchmark Dataset [32.236653072212015]
We propose an open-source dataset and a detection method for generated video for the first time.
First, we propose a scalable dataset consisting of 964 prompts, covering various forgery targets, scenes, behaviors, and actions.
Second, we found via probing experiments that spatial artifact-based detectors lack generalizability.
arXiv Detail & Related papers (2024-02-03T08:52:06Z) - Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z) - Video Content Swapping Using GAN [1.2300363114433952]
In this work, we will break down any frame in the video into content and pose.
We first extract the pose information from a video using a pre-trained human pose detection and use a generative model to synthesize the video based on the content code and pose code.
arXiv Detail & Related papers (2021-11-21T23:01:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.