Related papers: Beyond the Visible: Disocclusion-Aware Editing via Proxy Dynamic Graphs

Beyond the Visible: Disocclusion-Aware Editing via Proxy Dynamic Graphs

URL: http://arxiv.org/abs/2512.13392v2
Date: Tue, 16 Dec 2025 07:08:17 GMT
Title: Beyond the Visible: Disocclusion-Aware Editing via Proxy Dynamic Graphs
Authors: Anran Qi, Changjian Li, Adrien Bousseau, Niloy J. Mitra,
Abstract summary: We address image-to-video generation with explicit user control over the final frame's disoccluded regions.<n>We introduce a lightweight, user-editable Proxy Dynamic Graph (PDG) that drives part motion, while a frozen diffusion prior is used to synthesize plausible appearance that follows that motion.<n>We then let the user edit appearance in the disoccluded areas of the image, and exploit the visibility information encoded by the PDG to perform a latent-space composite that reconciles motion with user intent in these areas.
Score: 39.496648478488666
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We address image-to-video generation with explicit user control over the final frame's disoccluded regions. Current image-to-video pipelines produce plausible motion but struggle to generate predictable, articulated motions while enforcing user-specified content in newly revealed areas. Our key idea is to separate motion specification from appearance synthesis: we introduce a lightweight, user-editable Proxy Dynamic Graph (PDG) that deterministically yet approximately drives part motion, while a frozen diffusion prior is used to synthesize plausible appearance that follows that motion. In our training-free pipeline, the user loosely annotates and reposes a PDG, from which we compute a dense motion flow to leverage diffusion as a motion-guided shader. We then let the user edit appearance in the disoccluded areas of the image, and exploit the visibility information encoded by the PDG to perform a latent-space composite that reconciles motion with user intent in these areas. This design yields controllable articulation and user control over disocclusions without fine-tuning. We demonstrate clear advantages against state-of-the-art alternatives towards images turned into short videos of articulated objects, furniture, vehicles, and deformables. Our method mixes generative control, in the form of loose pose and structure, with predictable controls, in the form of appearance specification in the final frame in the disoccluded regions, unlocking a new image-to-video workflow. Code will be released on acceptance. Project page: https://anranqi.github.io/beyond-visible.github.io/

Related papers

MotionV2V: Editing Motion in a Video [53.791975554391534]
We propose modifying video motion by editing sparse trajectories extracted from the input.<n>We term the deviation between input and output trajectories a "motion edit"<n>Our approach allows for edits that start at any timestamp and propagate naturally.
arXiv Detail & Related papers (2025-11-25T18:57:25Z)
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising [23.044483059783143]
Diffusion-based video generation can create realistic videos, yet existing image- and text-based conditioning fails to offer precise motion control.<n>We introduce Time-to-Move (TTM), a training-free, plug-and-play framework for motion- and appearance-controlled video generation.
arXiv Detail & Related papers (2025-11-09T22:47:50Z)
Vectorized Video Representation with Easy Editing via Hierarchical Spatio-Temporally Consistent Proxy Embedding [45.593989778240655]
A proposed representation achieves high video reconstruction accuracy with fewer parameters.<n>It supports complex video processing tasks, including video in-painting and temporally consistent video editing.
arXiv Detail & Related papers (2025-10-14T08:05:30Z)
Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime! [88.12304235156591]
We propose textbfstReaming drag-oriEnted interactiVe vidEo manipuLation (REVEL), a new task that enables users to modify generated videos emphanytime on emphanything via fine-grained, interactive drag.<n>Our method can be seamlessly integrated into existing autoregressive video diffusion models.
arXiv Detail & Related papers (2025-10-03T22:38:35Z)
Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video [38.71994714429696]
We propose a novel and general framework to disentangle video data into its dynamic motion and static content components.<n>Our proposed method is a self-supervised pipeline with less assumptions and inductive biases than previous works.<n>We validate our disentangled representation learning framework on real-world talking head videos with motion transfer and auto-regressive motion generation tasks.
arXiv Detail & Related papers (2025-09-10T08:14:45Z)
ATI: Any Trajectory Instruction for Controllable Video Generation [25.249489701215467]
We propose a unified framework for motion control in video generation that seamlessly integrates camera movement, object-level translation, and fine-grained local motion.<n>Our approach offers a cohesive solution by projecting user-defined trajectories into the latent space of pre-trained image-to-video generation models.
arXiv Detail & Related papers (2025-05-28T23:49:18Z)
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors [64.54220123913154]
We introduce FramePainter as an efficient instantiation of image-to-video generation problem.<n>It only uses a lightweight sparse control encoder to inject editing signals.<n>It domainantly outperforms previous state-of-the-art methods with far less training data.
arXiv Detail & Related papers (2025-01-14T16:09:16Z)
Replace Anyone in Videos [82.37852750357331]
We present the ReplaceAnyone framework, which focuses on localized human replacement and insertion featuring intricate backgrounds.<n>We formulate this task as an image-conditioned video inpainting paradigm with pose guidance, utilizing a unified end-to-end video diffusion architecture.<n>The proposed ReplaceAnyone can be seamlessly applied not only to traditional 3D-UNet base models but also to DiT-based video models such as Wan2.1.
arXiv Detail & Related papers (2024-09-30T03:27:33Z)
MotionEditor: Editing Video Motion via Content-Aware Diffusion [96.825431998349]
MotionEditor is a diffusion model for video motion editing. It incorporates a novel content-aware motion adapter into ControlNet to capture temporal motion correspondence.
arXiv Detail & Related papers (2023-11-30T18:59:33Z)
Treating Motion as Option with Output Selection for Unsupervised Video Object Segmentation [16.37741705985433]
We propose a novel motion-as-option network that treats motion cues as an optional component rather than a necessity.<n>During training, we randomly input RGB images into the motion encoder instead of optical flow maps, which implicitly reduces the network's reliance on motion cues.<n>This design ensures that the motion encoder is capable of processing both RGB images and optical flow maps, leading to two distinct predictions depending on the type of input provided.
arXiv Detail & Related papers (2023-09-26T09:34:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.