IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning
- URL: http://arxiv.org/abs/2512.15635v1
- Date: Wed, 17 Dec 2025 17:47:18 GMT
- Title: IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning
- Authors: Yuanhang Li, Yiren Song, Junzhe Bai, Xinran Liang, Hu Yang, Libiao Jin, Qi Mao,
- Abstract summary: IC-Effect is an instruction-guided computation framework for few-shot video VFX editing.<n>It synthesizes complex effects while preserving spatial and temporal consistency.<n>A two-stage training strategy, consisting of general editing adaptation followed by effect-specific learning, ensures strong instruction following and robust effect modeling.
- Score: 13.89445714667069
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose \textbf{IC-Effect}, an instruction-guided, DiT-based framework for few-shot video VFX editing that synthesizes complex effects (\eg flames, particles and cartoon characters) while strictly preserving spatial and temporal consistency. Video VFX editing is highly challenging because injected effects must blend seamlessly with the background, the background must remain entirely unchanged, and effect patterns must be learned efficiently from limited paired data. However, existing video editing models fail to satisfy these requirements. IC-Effect leverages the source video as clean contextual conditions, exploiting the contextual learning capability of DiT models to achieve precise background preservation and natural effect injection. A two-stage training strategy, consisting of general editing adaptation followed by effect-specific learning via Effect-LoRA, ensures strong instruction following and robust effect modeling. To further improve efficiency, we introduce spatiotemporal sparse tokenization, enabling high fidelity with substantially reduced computation. We also release a paired VFX editing dataset spanning $15$ high-quality visual styles. Extensive experiments show that IC-Effect delivers high-quality, controllable, and temporally consistent VFX editing, opening new possibilities for video creation.
Related papers
- Tuning-free Visual Effect Transfer across Videos [91.93897438317397]
RefVFX is a framework that transfers complex temporal effects from a reference video onto a target video or image in a feed-forward manner.<n>We introduce a large-scale dataset of triplets, where each triplet consists of a reference effect video, an input image or video, and a corresponding output video.<n>We show that RefVFX produces visually consistent and temporally coherent edits, generalizes across unseen effect categories, and outperforms prompt-only baselines in both quantitative metrics and human preference.
arXiv Detail & Related papers (2026-01-12T18:59:32Z) - Over++: Generative Video Compositing for Layer Interaction Effects [9.071187452755206]
We present Over++, a video effect generation framework that makes no assumptions about camera pose, scene stationarity, or depth supervision.<n>Despite training on limited data, Over++ produces diverse and realistic environmental effects and outperforms existing baselines in effect generation and scene preservation.
arXiv Detail & Related papers (2025-12-22T18:39:58Z) - Generative Photographic Control for Scene-Consistent Video Cinematic Editing [75.45726688666083]
We propose CineCtrl, the first video cinematic editing framework that provides fine control over professional camera parameters.<n>We introduce a decoupled cross-attention mechanism to disentangle camera motion from photographic inputs.<n>Our model generates high-fidelity videos with precisely controlled, user-specified photographic camera effects.
arXiv Detail & Related papers (2025-11-17T03:17:23Z) - VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning [67.44716618860544]
We introduce VFXMaster, the first unified, reference-based framework for VFX video generation.<n>It recasts effect generation as an in-context learning task, enabling it to reproduce diverse dynamic effects from a reference video onto target content.<n>In addition, we propose an efficient one-shot effect adaptation mechanism to boost generalization capability on tough unseen effects from a single user-provided video rapidly.
arXiv Detail & Related papers (2025-10-29T17:59:53Z) - VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer [56.81599836980222]
We propose a novel paradigm for animated VFX generation as image animation, where dynamic effects are generated from user-friendly textual descriptions and static reference images.<n>Our work makes two primary contributions: (i) Open-VFX, the first high-quality VFX video dataset spanning 15 diverse effect categories, annotated with textual descriptions, and start-end timestamps for temporal control, and (ii) VFX Creator, a controllable VFX generation framework based on a Video Diffusion Transformer.
arXiv Detail & Related papers (2025-02-09T18:12:25Z) - Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition.<n>It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.<n>Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z) - VideoDirector: Precise Video Editing via Text-to-Video Models [45.53826541639349]
Current video editing methods rely on text-to-video (T2V) models, which inherently lack temporal-coherence generative ability.<n>We propose a spatial-temporal decoupled guidance (STDG) and multi-frame null-text optimization strategy to provide pivotal temporal cues for more precise pivotal inversion.<n> Experimental results demonstrate that our method effectively harnesses the powerful temporal generation capabilities of T2V models.
arXiv Detail & Related papers (2024-11-26T16:56:53Z) - EffiVED:Efficient Video Editing via Text-instruction Diffusion Models [9.287394166165424]
We introduce EffiVED, an efficient diffusion-based model that supports instruction-guided video editing.
We transform vast image editing datasets and open-world videos into a high-quality dataset for training EffiVED.
arXiv Detail & Related papers (2024-03-18T08:42:08Z) - MagicProp: Diffusion-based Video Editing via Motion-aware Appearance
Propagation [74.32046206403177]
MagicProp disentangles the video editing process into two stages: appearance editing and motion-aware appearance propagation.
In the first stage, MagicProp selects a single frame from the input video and applies image-editing techniques to modify the content and/or style of the frame.
In the second stage, MagicProp employs the edited frame as an appearance reference and generates the remaining frames using an autoregressive rendering approach.
arXiv Detail & Related papers (2023-09-02T11:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.