Related papers: DynVFX: Augmenting Real Videos with Dynamic Content

DynVFX: Augmenting Real Videos with Dynamic Content

URL: http://arxiv.org/abs/2502.03621v1
Date: Wed, 05 Feb 2025 21:14:55 GMT
Title: DynVFX: Augmenting Real Videos with Dynamic Content
Authors: Danah Yatim, Rafail Fridman, Omer Bar-Tal, Tali Dekel,
Abstract summary: We present a method for augmenting real-world videos with newly generated dynamic content.<n>Given an input video and a simple user-provided text instruction describing the desired content, our method synthesizes dynamic objects or complex scene effects.<n>The position, appearance, and motion of the new content are seamlessly integrated into the original footage.
Score: 19.393567535259518
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a method for augmenting real-world videos with newly generated dynamic content. Given an input video and a simple user-provided text instruction describing the desired content, our method synthesizes dynamic objects or complex scene effects that naturally interact with the existing scene over time. The position, appearance, and motion of the new content are seamlessly integrated into the original footage while accounting for camera motion, occlusions, and interactions with other dynamic objects in the scene, resulting in a cohesive and realistic output video. We achieve this via a zero-shot, training-free framework that harnesses a pre-trained text-to-video diffusion transformer to synthesize the new content and a pre-trained Vision Language Model to envision the augmented scene in detail. Specifically, we introduce a novel inference-based method that manipulates features within the attention mechanism, enabling accurate localization and seamless integration of the new content while preserving the integrity of the original scene. Our method is fully automated, requiring only a simple user instruction. We demonstrate its effectiveness on a wide range of edits applied to real-world videos, encompassing diverse objects and scenarios involving both camera and object motion.

Related papers

Compositional Video Synthesis by Temporal Object-Centric Learning [3.2228025627337864]
We present a novel framework for compositional video synthesis that leverages temporally consistent object-centric representations.<n>Our approach explicitly captures temporal dynamics by learning pose invariant object-centric slots and conditioning them on pretrained diffusion models.<n>This design enables high-quality, pixel-level video synthesis with superior temporal coherence.
arXiv Detail & Related papers (2025-07-28T14:11:04Z)
Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA [84.89284738178932]
We introduce a zero-shot framework for dynamic concept personalization in text-to-video models.<n>Our method leverages structured 2x2 video grids that spatially organize input and output pairs.<n>A dedicated Grid Fill module completes partially observed layouts, producing temporally coherent and identity preserving outputs.
arXiv Detail & Related papers (2025-07-23T22:09:38Z)
Get In Video: Add Anything You Want to the Video [48.06070610416688]
Video editing increasingly demands the ability to incorporate specific real-world instances into existing footage. Current approaches fail to capture the unique visual characteristics of particular subjects and ensure natural instance/scene interactions. We introduce "Get-In-Video Editing", where users provide reference images to precisely specify visual elements they wish to incorporate into videos.
arXiv Detail & Related papers (2025-03-08T16:27:53Z)
X-Dyna: Expressive Dynamic Human Image Animation [49.896933584815926]
X-Dyna is a zero-shot, diffusion-based pipeline for animating a single human image.<n>It generates realistic, context-aware dynamics for both the subject and the surrounding environment.
arXiv Detail & Related papers (2025-01-17T08:10:53Z)
Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model. To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z)
CAGE: Unsupervised Visual Composition and Animation for Controllable Video Generation [42.475807996071175]
We introduce an unsupervised approach to controllable and compositional video generation. Our model is trained from scratch on a dataset of unannotated videos. It can compose plausible novel scenes and animate objects by placing object parts at the desired locations in space and time.
arXiv Detail & Related papers (2024-03-21T12:50:15Z)
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors [63.43133768897087]
We propose a method to convert open-domain images into animated videos. The key idea is to utilize the motion prior to text-to-video diffusion models by incorporating the image into the generative process as guidance. Our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image.
arXiv Detail & Related papers (2023-10-18T14:42:16Z)
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation [69.20173154096]
We develop a framework comprised of two functional modules, Motion Structure Retrieval and Structure-Guided Text-to-Video Synthesis. For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure. For the second module, we propose a controllable video generation model that offers flexible controls over structure and characters.
arXiv Detail & Related papers (2023-07-13T17:57:13Z)
DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene. We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views. We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z)
Neural Scene Graphs for Dynamic Scenes [57.65413768984925]
We present the first neural rendering method that decomposes dynamic scenes into scene graphs. We learn implicitly encoded scenes, combined with a jointly learned latent representation to describe objects with a single implicit function.
arXiv Detail & Related papers (2020-11-20T12:37:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.