O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing
- URL: http://arxiv.org/abs/2509.01596v1
- Date: Mon, 01 Sep 2025 16:29:39 GMT
- Title: O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing
- Authors: Yuqing Chen, Junjie Wang, Lin Liu, Ruihang Chu, Xiaopeng Zhang, Qi Tian, Yujiu Yang,
- Abstract summary: O-DisCo-Edit is a unified framework that incorporates a novel object distortion control (O-DisCo)<n>This signal, based on random and adaptive noise, flexibly encapsulates a wide range of editing cues within a single representation.<n>O-DisCo-Edit enables efficient, high-fidelity editing through an effective training paradigm.
- Score: 88.93410369258203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have recently advanced video editing, yet controllable editing remains challenging due to the need for precise manipulation of diverse object properties. Current methods require different control signal for diverse editing tasks, which complicates model design and demands significant training resources. To address this, we propose O-DisCo-Edit, a unified framework that incorporates a novel object distortion control (O-DisCo). This signal, based on random and adaptive noise, flexibly encapsulates a wide range of editing cues within a single representation. Paired with a "copy-form" preservation module for preserving non-edited regions, O-DisCo-Edit enables efficient, high-fidelity editing through an effective training paradigm. Extensive experiments and comprehensive human evaluations consistently demonstrate that O-DisCo-Edit surpasses both specialized and multitask state-of-the-art methods across various video editing tasks. https://cyqii.github.io/O-DisCo-Edit.github.io/
Related papers
- ConsistEdit: Highly Consistent and Precise Training-free Visual Editing [17.162316662697965]
We propose ConsistEdit, a novel attention control method specifically tailored for MM-DiT.<n>It incorporates vision-only attention control, mask-guided pre-attention fusion, and differentiated manipulation of the query, key, and value tokens.<n>It achieves state-of-the-art performance across a wide range of image and video editing tasks, including both structure-consistent and structure-inconsistent scenarios.
arXiv Detail & Related papers (2025-10-20T17:59:52Z) - Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing [76.44219733285898]
Kontinuous Kontext is an instruction-driven editing model that provides a new dimension of control over edit strength.<n>A lightweight projector network maps the input scalar and the edit instruction to coefficients in the model's modulation space.<n>For training our model, we synthesize a diverse dataset of image-edit-instruction-strength quadruplets using existing generative models.
arXiv Detail & Related papers (2025-10-09T17:51:03Z) - EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning [58.53074381801114]
We introduce EditVerse, a unified framework for image and video generation and editing within a single model.<n>By representing all modalities, i.e. text, image, and video, as a unified token sequence, EditVerse leverages self-attention to achieve robust in-context learning.<n>We present EditVerseBench, the first benchmark for instruction-based video editing covering diverse tasks and resolutions.
arXiv Detail & Related papers (2025-09-24T17:59:30Z) - Image Editing As Programs with Diffusion Models [69.05164729625052]
We introduce Image Editing As Programs (IEAP), a unified image editing framework built upon the Diffusion Transformer (DiT) architecture.<n>IEAP approaches instructional editing through a reductionist lens, decomposing complex editing instructions into sequences of atomic operations.<n>Our framework delivers superior accuracy and semantic fidelity, particularly for complex, multi-step instructions.
arXiv Detail & Related papers (2025-06-04T16:57:24Z) - PRIMEdit: Probability Redistribution for Instance-aware Multi-object Video Editing with Benchmark Dataset [27.706882926164724]
PRIMEdit is a zero-shot framework that introduces two key modules: Instance-centric Probability Redistribution and Disentangled Multi-instance Sampling.<n>We present our new MIVE dataset for video editing featuring diverse video scenarios, and introduce the Cross-Instance Accuracy (CIA) Score to evaluate editing leakage.<n>Our extensive qualitative, quantitative, and user study evaluations demonstrate that PRIMEdit significantly outperforms recent state-of-the-art methods in terms of editing faithfulness, accuracy, and leakage prevention.
arXiv Detail & Related papers (2024-12-17T13:00:04Z) - Re-Attentional Controllable Video Diffusion Editing [48.052781838711994]
We propose a Re-Attentional Controllable Video Diffusion Editing (ReAtCo) method.<n>To align the spatial placement of the target objects with the edited text prompt in a training-free manner, we propose a Re-Attentional Diffusion (RAD)<n>RAD refocuses the cross-attention activation responses between the edited text prompt and the target video during the denoising stage, resulting in a spatially location-aligned and semantically high-fidelity manipulated video.
arXiv Detail & Related papers (2024-12-16T12:32:21Z) - GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models [2.362412515574206]
We propose "GenVideo" for editing videos leveraging target-image aware T2I models.
Our approach handles edits with target objects of varying shapes and sizes while maintaining the temporal consistency of the edit.
arXiv Detail & Related papers (2024-04-18T23:25:27Z) - Neutral Editing Framework for Diffusion-based Video Editing [24.370584544151424]
This paper proposes Neutral Editing (NeuEdit) framework to enable complex non-rigid editing.
NeuEdit introduces a concept of neutralization' that enhances a tuning-editing process of diffusion-based editing systems.
Experiments on numerous videos demonstrate adaptability and effectiveness of the NeuEdit framework.
arXiv Detail & Related papers (2023-12-10T16:28:32Z) - MotionEditor: Editing Video Motion via Content-Aware Diffusion [96.825431998349]
MotionEditor is a diffusion model for video motion editing.
It incorporates a novel content-aware motion adapter into ControlNet to capture temporal motion correspondence.
arXiv Detail & Related papers (2023-11-30T18:59:33Z) - Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z) - CCEdit: Creative and Controllable Video Editing via Diffusion Models [58.34886244442608]
CCEdit is a versatile generative video editing framework based on diffusion models.
Our approach employs a novel trident network structure that separates structure and appearance control.
Our user studies compare CCEdit with eight state-of-the-art video editing methods.
arXiv Detail & Related papers (2023-09-28T15:03:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.