Related papers: Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing

Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing

URL: http://arxiv.org/abs/2508.13797v1
Date: Tue, 19 Aug 2025 12:57:31 GMT
Title: Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing
Authors: Feng-Lin Liu, Shi-Yang Li, Yan-Pei Cao, Hongbo Fu, Lin Gao,
Abstract summary: editing structural content of 3D scenes in videos remains challenging.<n>Key challenges include generating novel view content that remains consistent with the original video.<n>We propose Sketch3DVE, a sketch-based 3D-aware video editing method.
Score: 41.74354582607005
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent video editing methods achieve attractive results in style transfer or appearance modification. However, editing the structural content of 3D scenes in videos remains challenging, particularly when dealing with significant viewpoint changes, such as large camera rotations or zooms. Key challenges include generating novel view content that remains consistent with the original video, preserving unedited regions, and translating sparse 2D inputs into realistic 3D video outputs. To address these issues, we propose Sketch3DVE, a sketch-based 3D-aware video editing method to enable detailed local manipulation of videos with significant viewpoint changes. To solve the challenge posed by sparse inputs, we employ image editing methods to generate edited results for the first frame, which are then propagated to the remaining frames of the video. We utilize sketching as an interaction tool for precise geometry control, while other mask-based image editing methods are also supported. To handle viewpoint changes, we perform a detailed analysis and manipulation of the 3D information in the video. Specifically, we utilize a dense stereo method to estimate a point cloud and the camera parameters of the input video. We then propose a point cloud editing approach that uses depth maps to represent the 3D geometry of newly edited components, aligning them effectively with the original 3D scene. To seamlessly merge the newly edited content with the original video while preserving the features of unedited regions, we introduce a 3D-aware mask propagation strategy and employ a video diffusion model to produce realistic edited videos. Extensive experiments demonstrate the superiority of Sketch3DVE in video editing. Homepage and code: http://http://geometrylearning.com/Sketch3DVE/

Related papers

3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing [58.54083747494426]
3DGS-Drag is a point-based 3D editing framework that provides efficient, intuitive drag manipulation of real 3D scenes.<n>Our approach bridges the gap between deformation-based and 2D-editing-based 3D editing methods.
arXiv Detail & Related papers (2026-01-12T19:57:31Z)
Generative Video Motion Editing with 3D Point Tracks [66.55707897151909]
We present a track-conditioned V2V framework that enables joint editing of camera and object motion.<n>We achieve this by conditioning a model on a source video and paired 3D point tracks representing source and target motions.<n>Our model supports diverse motion edits, including joint camera/object manipulation, motion transfer, and non-rigid deformation.
arXiv Detail & Related papers (2025-12-01T18:59:55Z)
Fast Multi-view Consistent 3D Editing with Video Priors [19.790628738739354]
We propose generative Video Prior based 3D Editing (ViP3DE)<n>Our key insight is to condition the video generation model on a single edited view to generate other consistent edited views for 3D updating directly.<n>Our proposed ViP3DE can achieve high-quality 3D editing results even within a single forward pass, significantly outperforming existing methods in both editing quality and speed.
arXiv Detail & Related papers (2025-11-28T13:31:10Z)
3D-LATTE: Latent Space 3D Editing from Textual Instructions [64.77718887666312]
We propose a training-free editing method that operates within the latent space of a native 3D diffusion model.<n>We guide the edit synthesis by blending 3D attention maps from the generation with the source object.
arXiv Detail & Related papers (2025-08-29T22:51:59Z)
Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy [36.08715662927022]
We present Shape-for-Motion, a novel framework that incorporates a 3D proxy for precise and consistent video editing.<n>Our framework supports various precise and physically-consistent manipulations across the video frames, including pose editing, rotation, scaling, translation, texture modification, and object composition.
arXiv Detail & Related papers (2025-06-27T17:59:01Z)
SketchVideo: Sketch-based Video Generation and Editing [51.99066098393491]
We aim to achieve sketch-based spatial and motion control for video generation and support fine-grained editing of real or synthetic videos.<n>Based on the DiT video generation model, we propose a memory-efficient control structure with sketch control blocks that predict residual features of skipped DiT blocks.<n>For sketch-based video editing, we design an additional video insertion module that maintains consistency between the newly edited content and the original video's spatial feature and dynamic motion.
arXiv Detail & Related papers (2025-03-30T02:44:09Z)
VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors [27.685348720003823]
We propose name as a method for editing 3D object compositions in videos of static scenes with camera motion.<n>Our approach allows editing the 3D position of a 3D object across all frames of a video in a temporally consistent manner.
arXiv Detail & Related papers (2025-03-03T02:29:48Z)
DragScene: Interactive 3D Scene Editing with Single-view Drag Instructions [9.31257776760014]
3D editing has shown remarkable capability in editing scenes based on various instructions.<n>Existing methods struggle with achieving intuitive, localized editing.<n>We introduce DragScene, a framework that integrates drag-style editing with diverse 3D representations.
arXiv Detail & Related papers (2024-12-18T07:02:01Z)
PrEditor3D: Fast and Precise 3D Shape Editing [100.09112677669376]
We propose a training-free approach to 3D editing that enables the editing of a single shape within a few minutes.<n>The edited 3D mesh aligns well with the prompts, and remains identical for regions that are not intended to be altered.
arXiv Detail & Related papers (2024-12-09T15:44:47Z)
Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts [76.73043724587679]
We propose a dialogue-based 3D scene editing approach, termed CE3D. Hash-Atlas represents 3D scene views, which transfers the editing of 3D scenes onto 2D atlas images. Results demonstrate that CE3D effectively integrates multiple visual models to achieve diverse editing visual effects.
arXiv Detail & Related papers (2024-07-09T13:24:42Z)
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting [100.94916668527544]
Existing methods solely focus on either 2D individual object or 3D global scene editing. We propose 3DitScene, a novel and unified scene editing framework. It enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects.
arXiv Detail & Related papers (2024-05-28T17:59:01Z)
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing [48.086102360155856]
We introduce the dynamic Neural Radiance Fields (NeRF) as the innovative video representation. We propose the image-based video-NeRF editing pipeline with a set of innovative designs to provide consistent and controllable editing. Our method, dubbed as DynVideo-E, significantly outperforms SOTA approaches on two challenging datasets by a large margin of 50% 95% for human preference.
arXiv Detail & Related papers (2023-10-16T17:48:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.