ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models
- URL: http://arxiv.org/abs/2508.18271v1
- Date: Mon, 25 Aug 2025 17:59:40 GMT
- Title: ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models
- Authors: Haitang Feng, Jie Liu, Jie Tang, Gangshan Wu, Beiqi Chen, Jianhuang Lai, Guangcong Wang,
- Abstract summary: 3D inpainting often relies on multi-view 2D image inpainting, where inherent inconsistencies result in blurred textures, spatial discontinuities, and distracting visual artifacts.<n>We propose Filler-3D, a novel method designed for the completion and editing of high-quality and consistent 3D objects.<n>We analyze the representation gap between 3D and videos, and propose an adaptation of a video inpainting model for 3D scene inpainting.
- Score: 76.80262068405243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D inpainting often relies on multi-view 2D image inpainting, where the inherent inconsistencies across different inpainted views can result in blurred textures, spatial discontinuities, and distracting visual artifacts. These inconsistencies pose significant challenges when striving for accurate and realistic 3D object completion, particularly in applications that demand high fidelity and structural coherence. To overcome these limitations, we propose ObjFiller-3D, a novel method designed for the completion and editing of high-quality and consistent 3D objects. Instead of employing a conventional 2D image inpainting model, our approach leverages a curated selection of state-of-the-art video editing model to fill in the masked regions of 3D objects. We analyze the representation gap between 3D and videos, and propose an adaptation of a video inpainting model for 3D scene inpainting. In addition, we introduce a reference-based 3D inpainting method to further enhance the quality of reconstruction. Experiments across diverse datasets show that compared to previous methods, ObjFiller-3D produces more faithful and fine-grained reconstructions (PSNR of 26.6 vs. NeRFiller (15.9) and LPIPS of 0.19 vs. Instant3dit (0.25)). Moreover, it demonstrates strong potential for practical deployment in real-world 3D editing applications. Project page: https://objfiller3d.github.io/ Code: https://github.com/objfiller3d/ObjFiller-3D .
Related papers
- DiGA3D: Coarse-to-Fine Diffusional Propagation of Geometry and Appearance for Versatile 3D Inpainting [10.515239541326737]
Single reference inpainting methods lack robustness when dealing with views far from the reference view.<n>Appearance inconsistency arises when independently inpainting multi-view images with 2D diffusion priors.<n>DiGA3D uses diffusion models to propagate consistent appearance and geometry in a coarse-to-fine manner.
arXiv Detail & Related papers (2025-07-01T04:57:08Z) - Constructing a 3D Town from a Single Image [23.231661811526955]
3DTown is a training-free framework designed to synthesize realistic and coherent 3D scenes from a single top-down view.<n>We decompose the input image into overlapping regions and generate each using a pretrained 3D object generator.<n>Our results demonstrate that high-quality 3D town generation is achievable from a single image using a principled, training-free approach.
arXiv Detail & Related papers (2025-05-21T17:10:47Z) - Visibility-Uncertainty-guided 3D Gaussian Inpainting via Scene Conceptional Learning [63.94919846010485]
3D Gaussian inpainting (3DGI) is challenging in effectively leveraging complementary visual and semantic cues from multiple input views.<n>We propose a method that measures the visibility uncertainties of 3D points across different input views and uses them to guide 3DGI.<n>We build a novel 3DGI framework, VISTA, by integrating VISibility-uncerTainty-guided 3DGI with scene conceptuAl learning.
arXiv Detail & Related papers (2025-04-23T06:21:11Z) - Art3D: Training-Free 3D Generation from Flat-Colored Illustration [22.358983277403233]
Art3D is a training-free method that can lift flat-colored 2D designs into 3D.<n>We benchmark the generalization performance of existing image-to-3D models on flat-colored images without 3D feeling.
arXiv Detail & Related papers (2025-04-14T17:53:10Z) - CAT3D: Create Anything in 3D with Multi-View Diffusion Models [87.80820708758317]
We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model.
CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation.
arXiv Detail & Related papers (2024-05-16T17:59:05Z) - NeRFiller: Completing Scenes via Generative 3D Inpainting [113.18181179986172]
We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting.
In contrast to related works, we focus on completing scenes rather than deleting foreground objects.
arXiv Detail & Related papers (2023-12-07T18:59:41Z) - Inpaint3D: 3D Scene Content Generation using 2D Inpainting Diffusion [18.67196713834323]
This paper presents a novel approach to inpainting 3D regions of a scene, given masked multi-view images, by distilling a 2D diffusion model into a learned 3D scene representation (e.g. a NeRF)
We show that this 2D diffusion model can still serve as a generative prior in a 3D multi-view reconstruction problem where we optimize a NeRF using a combination of score distillation sampling and NeRF reconstruction losses.
Because our method can generate content to fill any 3D masked region, we additionally demonstrate 3D object completion, 3D object replacement, and 3D scene completion
arXiv Detail & Related papers (2023-12-06T19:30:04Z) - ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image
Collections [71.46546520120162]
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging.
We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
We produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations.
arXiv Detail & Related papers (2023-06-07T17:47:50Z) - RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and
Generation [68.06991943974195]
We present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision.
We evaluate RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images.
arXiv Detail & Related papers (2022-11-17T20:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.