DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving Scenes
- URL: http://arxiv.org/abs/2412.19458v2
- Date: Mon, 30 Dec 2024 02:52:30 GMT
- Title: DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving Scenes
- Authors: Yiyuan Liang, Zhiying Yan, Liqun Chen, Jiahuan Zhou, Luxin Yan, Sheng Zhong, Xu Zou,
- Abstract summary: DriveEditor is a diffusion-based framework for object editing in driving videos.<n>It offers a unified framework for comprehensive object editing operations, including repositioning, replacement, deletion, and insertion.
- Score: 23.215760822443194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-centric autonomous driving systems require diverse data for robust training and evaluation, which can be augmented by manipulating object positions and appearances within existing scene captures. While recent advancements in diffusion models have shown promise in video editing, their application to object manipulation in driving scenarios remains challenging due to imprecise positional control and difficulties in preserving high-fidelity object appearances. To address these challenges in position and appearance control, we introduce DriveEditor, a diffusion-based framework for object editing in driving videos. DriveEditor offers a unified framework for comprehensive object editing operations, including repositioning, replacement, deletion, and insertion. These diverse manipulations are all achieved through a shared set of varying inputs, processed by identical position control and appearance maintenance modules. The position control module projects the given 3D bounding box while preserving depth information and hierarchically injects it into the diffusion process, enabling precise control over object position and orientation. The appearance maintenance module preserves consistent attributes with a single reference image by employing a three-tiered approach: low-level detail preservation, high-level semantic maintenance, and the integration of 3D priors from a novel view synthesis model. Extensive qualitative and quantitative evaluations on the nuScenes dataset demonstrate DriveEditor's exceptional fidelity and controllability in generating diverse driving scene edits, as well as its remarkable ability to facilitate downstream tasks. Project page: https://yvanliang.github.io/DriveEditor.
Related papers
- HorizonForge: Driving Scene Editing with Any Trajectories and Any Vehicles [63.88996084630768]
Controllable driving scene generation is critical for realistic and scalable autonomous driving simulation.<n>We introduce HorizonForge, a unified framework that reconstructs scenes as editable Gaussian Splats and Meshes.<n>Experiments show that Gaussian-Mesh representation delivers substantially higher fidelity than alternative 3D representations.
arXiv Detail & Related papers (2026-02-24T20:03:47Z) - LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents [61.91651123290512]
LangDriveCTRL is a framework for editing real-world driving videos to synthesize diverse traffic scenarios.<n>It supports both object node editing (removal, insertion and replacement) and multi-object behavior editing from a single natural-language instruction.
arXiv Detail & Related papers (2025-12-19T10:57:03Z) - Free-Form Scene Editor: Enabling Multi-Round Object Manipulation like in a 3D Engine [83.0145525456509]
We present FFSE, a 3D-aware framework designed to enable intuitive, physically-consistent object editing on real-world images.<n>Unlike previous approaches that either operate in image space or require slow and error-prone 3D reconstruction, FFSE models editing as a sequence of learned 3D transformations.<n>To support learning of multi-round 3D-aware object manipulation, we introduce 3DObjectEditor.
arXiv Detail & Related papers (2025-11-17T18:57:39Z) - O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing [88.93410369258203]
O-DisCo-Edit is a unified framework that incorporates a novel object distortion control (O-DisCo)<n>This signal, based on random and adaptive noise, flexibly encapsulates a wide range of editing cues within a single representation.<n>O-DisCo-Edit enables efficient, high-fidelity editing through an effective training paradigm.
arXiv Detail & Related papers (2025-09-01T16:29:39Z) - Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation [12.982001613987315]
G2Editor is a framework designed for imprecise and precise object editing in driving videos.<n>A scene-level 3D bounding box layout is employed to reconstruct occluded areas of non-target objects.<n>Experiments demonstrate that G2Editor effectively supports object repositioning, insertion and deletion within a unified framework.
arXiv Detail & Related papers (2025-08-28T06:39:53Z) - Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence [4.778564042492516]
We present a framework for controllable pedestrian video editing in multi-view driving scenarios by integrating video inpainting and human motion control techniques.<n>Our approach begins by identifying pedestrian regions of interest across multiple camera views, expanding detection bounding boxes with a fixed ratio, and resizing and stitching these regions into a unified canvas.<n>Experiments demonstrate that our framework achieves high-quality pedestrian editing with strong visual realism coherence, and cross-view consistency.
arXiv Detail & Related papers (2025-08-01T03:56:57Z) - SceneCrafter: Controllable Multi-View Driving Scene Editing [44.91248700043744]
We propose SceneCrafter, a versatile editor for realistic 3D-consistent manipulation of driving scenes captured from multiple cameras.<n>SceneCrafter achieves state-of-the-art realism, controllability, 3D consistency, and scene editing quality compared to existing baselines.
arXiv Detail & Related papers (2025-06-24T10:23:47Z) - Pro3D-Editor : A Progressive-Views Perspective for Consistent and Precise 3D Editing [25.237699330731395]
Text-guided 3D editing aims to precisely edit semantically relevant local 3D regions.<n>Existing methods typically edit 2D views indiscriminately and projecting them back into 3D space.<n>We argue that ideal consistent 3D editing can be achieved through a textitprogressive-views paradigm
arXiv Detail & Related papers (2025-05-31T11:11:55Z) - GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control [50.67481583744243]
We introduce GeoDrive, which explicitly integrates robust 3D geometry conditions into driving world models.<n>We propose a dynamic editing module during training to enhance the renderings by editing the positions of the vehicles.<n>Our method significantly outperforms existing models in both action accuracy and 3D spatial awareness.
arXiv Detail & Related papers (2025-05-28T14:46:51Z) - DragScene: Interactive 3D Scene Editing with Single-view Drag Instructions [9.31257776760014]
3D editing has shown remarkable capability in editing scenes based on various instructions.
Existing methods struggle with achieving intuitive, localized editing.
We introduce DragScene, a framework that integrates drag-style editing with diverse 3D representations.
arXiv Detail & Related papers (2024-12-18T07:02:01Z) - Manipulating Vehicle 3D Shapes through Latent Space Editing [0.0]
This paper introduces a framework that employs a pre-trained regressor, enabling continuous, precise, attribute-specific modifications to vehicle 3D models.
Our method not only preserves the inherent identity of vehicle 3D objects, but also supports multi-attribute editing, allowing for extensive customization without compromising the model's structural integrity.
arXiv Detail & Related papers (2024-10-31T13:41:16Z) - EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing [114.14164860467227]
We propose Edit-Room, a framework capable of executing a variety of layout edits through natural language commands.
Specifically, EditRoom leverages Large Language Models (LLMs) for command planning and generates target scenes.
We have developed an automatic pipeline to augment existing 3D scene datasets and introduced EditRoom-DB, a large-scale dataset with 83k editing pairs.
arXiv Detail & Related papers (2024-10-03T17:42:24Z) - TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation [35.951718189386845]
We propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS)
TAS facilitates a tightly coupled iterative process between 2D view editing and 3D updating, preventing error accumulation yielded from text-to-image process.
We present a tuning-free View-Consistent Attention Control (VCAC) module that leverages cross-view semantic and geometric reference from the source branch to yield aligned views from the target branch during the editing of 2D views.
arXiv Detail & Related papers (2024-07-02T08:06:58Z) - HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness [57.18183962641015]
We present HOI-Swap, a video editing framework trained in a self-supervised manner.
The first stage focuses on object swapping in a single frame with HOI awareness.
The second stage extends the single-frame edit across the entire sequence.
arXiv Detail & Related papers (2024-06-11T22:31:29Z) - Customizing Text-to-Image Diffusion with Object Viewpoint Control [53.621518249820745]
We introduce a new task -- enabling explicit control of the object viewpoint in the customization of text-to-image diffusion models.
This allows us to modify the custom object's properties and generate it in various background scenes via text prompts.
We propose to condition the diffusion process on the 3D object features rendered from the target viewpoint.
arXiv Detail & Related papers (2024-04-18T16:59:51Z) - View-Consistent 3D Editing with Gaussian Splatting [50.6460814430094]
View-consistent Editing (VcEdit) is a novel framework that seamlessly incorporates 3DGS into image editing processes.<n>By incorporating consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency.
arXiv Detail & Related papers (2024-03-18T15:22:09Z) - VASE: Object-Centric Appearance and Shape Manipulation of Real Videos [108.60416277357712]
In this work, we introduce a framework that is object-centric and is designed to control both the object's appearance and, notably, to execute precise and explicit structural modifications on the object.
We build our framework on a pre-trained image-conditioned diffusion model, integrate layers to handle the temporal dimension, and propose training strategies and architectural modifications to enable shape control.
We evaluate our method on the image-driven video editing task showing similar performance to the state-of-the-art, and showcasing novel shape-editing capabilities.
arXiv Detail & Related papers (2024-01-04T18:59:24Z) - Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization [21.8454418337306]
We propose Plasticine3D, a novel text-guided controlled 3D editing pipeline that can perform 3D non-rigid editing.
Our work divides the editing process into a geometry editing stage and a texture editing stage to achieve separate control of structure and appearance.
For the purpose of fine-grained control, we propose Embedding-Fusion (EF) to blend the original characteristics with the editing objectives in the embedding space.
arXiv Detail & Related papers (2023-12-15T09:01:54Z) - MotionEditor: Editing Video Motion via Content-Aware Diffusion [96.825431998349]
MotionEditor is a diffusion model for video motion editing.
It incorporates a novel content-aware motion adapter into ControlNet to capture temporal motion correspondence.
arXiv Detail & Related papers (2023-11-30T18:59:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.