Related papers: MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing

MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing

URL: http://arxiv.org/abs/2408.08000v2
Date: Sun, 3 Nov 2024 02:01:56 GMT
Title: MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing
Authors: Chenjie Cao, Chaohui Yu, Fan Wang, Xiangyang Xue, Yanwei Fu,
Abstract summary: Novel View Synthesis (NVS) and 3D generation have recently achieved prominent improvements. We propose MVInpainter, re-formulating the 3D editing as a multi-view 2D inpainting task. MVInpainter partially inpaints multi-view images with the reference guidance rather than intractably generating an entirely novel view from scratch.
Score: 90.30646271720919
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Novel View Synthesis (NVS) and 3D generation have recently achieved prominent improvements. However, these works mainly focus on confined categories or synthetic 3D assets, which are discouraged from generalizing to challenging in-the-wild scenes and fail to be employed with 2D synthesis directly. Moreover, these methods heavily depended on camera poses, limiting their real-world applications. To overcome these issues, we propose MVInpainter, re-formulating the 3D editing as a multi-view 2D inpainting task. Specifically, MVInpainter partially inpaints multi-view images with the reference guidance rather than intractably generating an entirely novel view from scratch, which largely simplifies the difficulty of in-the-wild NVS and leverages unmasked clues instead of explicit pose conditions. To ensure cross-view consistency, MVInpainter is enhanced by video priors from motion components and appearance guidance from concatenated reference key&value attention. Furthermore, MVInpainter incorporates slot attention to aggregate high-level optical flow features from unmasked regions to control the camera movement with pose-free training and inference. Sufficient scene-level experiments on both object-centric and forward-facing datasets verify the effectiveness of MVInpainter, including diverse tasks, such as multi-view object removal, synthesis, insertion, and replacement. The project page is https://ewrfcas.github.io/MVInpainter/.

Related papers

VIRGi: View-dependent Instant Recoloring of 3D Gaussians Splats [53.602701067430075]
We introduce VIRGi, a novel approach for rapidly editing the color of scenes modeled by 3DGS.<n>By fine-tuning the weights of a single user, the color edits are seamlessly propagated to the entire scene in just two seconds.<n>An exhaustive validation on diverse datasets demonstrates significant quantitative and qualitative advancements over competitors.
arXiv Detail & Related papers (2026-03-03T13:41:17Z)
ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models [76.80262068405243]
3D inpainting often relies on multi-view 2D image inpainting, where inherent inconsistencies result in blurred textures, spatial discontinuities, and distracting visual artifacts.<n>We propose Filler-3D, a novel method designed for the completion and editing of high-quality and consistent 3D objects.<n>We analyze the representation gap between 3D and videos, and propose an adaptation of a video inpainting model for 3D scene inpainting.
arXiv Detail & Related papers (2025-08-25T17:59:40Z)
Voyaging into Perpetual Dynamic Scenes from a Single View [31.85867311855001]
Key challenge is to ensure that different generated views be consistent with the underlying 3D motions.<n>We propose DynamicVoyager, which reformulates dynamic scene generation as a scene outpainting problem with new dynamic content.<n> Experiments show that our model can generate perpetual scenes with consistent motions along fly-through cameras.
arXiv Detail & Related papers (2025-07-05T22:49:25Z)
InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model [46.67494008720215]
InstaInpaint is a framework that produces 3D-scene inpainting from a 2D inpainting proposal within 0.4 seconds.<n>We analyze and identify several key designs that improve generalization, textural consistency, and geometric correctness.<n>InstaInpaint achieves a 1000x speed-up from prior methods while maintaining a state-of-the-art performance across two standard benchmarks.
arXiv Detail & Related papers (2025-06-12T17:59:55Z)
Visibility-Uncertainty-guided 3D Gaussian Inpainting via Scene Conceptional Learning [63.94919846010485]
3D Gaussian inpainting (3DGI) is challenging in effectively leveraging complementary visual and semantic cues from multiple input views. We propose a method that measures the visibility uncertainties of 3D points across different input views and uses them to guide 3DGI. We build a novel 3DGI framework, VISTA, by integrating VISibility-uncerTainty-guided 3DGI with scene conceptuAl learning.
arXiv Detail & Related papers (2025-04-23T06:21:11Z)
MTV-Inpaint: Multi-Task Long Video Inpainting [30.963300199975656]
Video inpainting involves modifying local regions within a video, ensuring spatial and temporal consistency. Recent advancements in text-to-video (T2V) diffusion models pave the way for text-guided video inpainting. We propose MTV-Inpaint, a unified multi-task video inpainting framework capable of handling both traditional scene completion and novel object insertion tasks.
arXiv Detail & Related papers (2025-03-14T13:54:10Z)
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D [63.9188712646076]
Texturing is a 3D asset production, which enhances the visual appeal and visual appeal. Despite recent advancements, methods often yield subpar results, primarily due to local discontinuities. We propose a novel framework called MVPaint, which can generate high-resolution, seamless multiview consistency.
arXiv Detail & Related papers (2024-11-04T17:59:39Z)
Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion. We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases. Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z)
NeRFiller: Completing Scenes via Generative 3D Inpainting [113.18181179986172]
We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting. In contrast to related works, we focus on completing scenes rather than deleting foreground objects.
arXiv Detail & Related papers (2023-12-07T18:59:41Z)
OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation with Neural Radiance Fields [53.32527220134249]
The emergence of Neural Radiance Fields (NeRF) for novel view synthesis has increased interest in 3D scene editing. Current methods face challenges such as time-consuming object labeling, limited capability to remove specific targets, and compromised rendering quality after removal. This paper proposes a novel object-removing pipeline, named OR-NeRF, that can remove objects from 3D scenes with user-given points or text prompts on a single view.
arXiv Detail & Related papers (2023-05-17T18:18:05Z)
Clutter Detection and Removal in 3D Scenes with View-Consistent Inpainting [10.087325516269265]
We present an automatic system that removes clutter from 3D scenes and inpaints with coherent geometry and texture. We group noisy fine-grained labels, leverage virtual rendering, and impose an instance-level area-sensitive loss. Experiments on ScanNet and Matterport dataset show that our method outperforms baselines for clutter segmentation and 3D inpainting.
arXiv Detail & Related papers (2023-04-07T17:57:20Z)
Decoupling Dynamic Monocular Videos for Dynamic View Synthesis [50.93409250217699]
We tackle the challenge of dynamic view synthesis from dynamic monocular videos in an unsupervised fashion. Specifically, we decouple the motion of the dynamic objects into object motion and camera motion, respectively regularized by proposed unsupervised surface consistency and patch-based multi-view constraints.
arXiv Detail & Related papers (2023-04-04T11:25:44Z)
SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields [26.296017756560467]
In 3D, solutions must be consistent across multiple views and geometrically valid. We propose a novel 3D inpainting method that addresses these challenges. We first demonstrate the superiority of our approach on multiview segmentation, comparing to NeRFbased methods and 2D segmentation approaches.
arXiv Detail & Related papers (2022-11-22T13:14:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.