Efficient Interactive 3D Multi-Object Removal
- URL: http://arxiv.org/abs/2501.17636v2
- Date: Thu, 30 Jan 2025 05:18:21 GMT
- Title: Efficient Interactive 3D Multi-Object Removal
- Authors: Jingcheng Ni, Weiguang Zhao, Daniel Wang, Ziyao Zeng, Chenyu You, Alex Wong, Kaizhu Huang,
- Abstract summary: We propose an efficient and user-friendly pipeline for 3D multi-object removal.<n>To ensure object consistency and correspondence across multiple views, we propose a novel mask matching and refinement module.<n>Our method significantly reduces computational costs, achieving processing speeds more than 80% faster than state-of-the-art methods.
- Score: 25.832938786291358
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Object removal is of great significance to 3D scene understanding, essential for applications in content filtering and scene editing. Current mainstream methods primarily focus on removing individual objects, with a few methods dedicated to eliminating an entire area or all objects of a certain category. They however confront the challenge of insufficient granularity and flexibility for real-world applications, where users demand tailored excision and preservation of objects within defined zones. In addition, most of the current methods require kinds of priors when addressing multi-view inpainting, which is time-consuming. To address these limitations, we propose an efficient and user-friendly pipeline for 3D multi-object removal, enabling users to flexibly select areas and define objects for removal or preservation. Concretely, to ensure object consistency and correspondence across multiple views, we propose a novel mask matching and refinement module, which integrates homography-based warping with high-confidence anchor points for segmentation. By leveraging the IoU joint shape context distance loss, we enhance the accuracy of warped masks and improve subsequent inpainting processes. Considering the current immaturity of 3D multi-object removal, we provide a new evaluation dataset to bridge the developmental void. Experimental results demonstrate that our method significantly reduces computational costs, achieving processing speeds more than 80% faster than state-of-the-art methods while maintaining equivalent or higher reconstruction quality.
Related papers
- Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment [55.74860093731475]
Marmot is a novel framework that employs Multi-Agent Reasoning for Multi-Object Self-Correcting.<n>We construct a multi-agent self-correcting system featuring a decision-execution-verification mechanism.<n>Experiments demonstrate that Marmot significantly improves accuracy in object counting, attribute assignment, and spatial relationships.
arXiv Detail & Related papers (2025-04-10T16:54:28Z) - IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement [15.206470606085341]
We introduce a novel approach that produces inpainted 3D scenes with consistent visual quality and coherent underlying geometry.
Specifically, we propose a robust 3D inpainting pipeline that incorporates geometric priors and a multi-view refinement network trained via test-time adaptation.
We develop a novel inpainting mask detection technique to derive targeted inpainting masks from object masks, boosting the performance in handling unconstrained scenes.
arXiv Detail & Related papers (2025-03-06T14:50:17Z) - MObI: Multimodal Object Inpainting Using Diffusion Models [52.07640413626605]
This paper introduces MObI, a novel framework for Multimodal Object Inpainting.<n>Using a single reference RGB image, MObI enables objects to be seamlessly inserted into existing multimodal scenes.<n>Unlike traditional inpainting methods that rely solely on edit masks, our 3D bounding box conditioning gives objects accurate spatial positioning and realistic scaling.
arXiv Detail & Related papers (2025-01-06T17:43:26Z) - IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics.
Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs.
We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z) - Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation [61.040832373015014]
We propose Flex3D, a novel framework for generating high-quality 3D content from text, single images, or sparse view images.
We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object.
In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs.
arXiv Detail & Related papers (2024-10-01T17:29:43Z) - Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model [15.936267489962122]
We propose a novel method for object insertion in 3D content represented by Gaussian Splatting.
Our approach introduces a multi-view diffusion model, dubbed MVInpainter, which is built upon a pre-trained stable video diffusion model.
Within MVInpainter, we incorporate a ControlNet-based conditional injection module to enable controlled and more predictable multi-view generation.
arXiv Detail & Related papers (2024-09-25T13:52:50Z) - MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation [17.133440382384578]
We propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting.
A novel framework called Multi-view Attention Inverse Rendering (MAIR) was recently introduced to improve the quality of scene-level inverse rendering.
arXiv Detail & Related papers (2024-08-13T08:04:23Z) - ObjectCarver: Semi-automatic segmentation, reconstruction and separation of 3D objects [44.38881095466177]
Implicit neural fields have made remarkable progress in reconstructing 3D surfaces from multiple images.
Previous work has attempted to tackle this problem by introducing a framework to train separate signed distance fields.
We introduce our method, ObjectCarver, to tackle the problem of object separation from just click input in a single view.
arXiv Detail & Related papers (2024-07-26T22:13:20Z) - A bioinspired three-stage model for camouflaged object detection [8.11866601771984]
We propose a three-stage model that enables coarse-to-fine segmentation in a single iteration.
Our model employs three decoders to sequentially process subsampled features, cropped features, and high-resolution original features.
Our network surpasses state-of-the-art CNN-based counterparts without unnecessary complexities.
arXiv Detail & Related papers (2023-05-22T02:01:48Z) - OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation
with Neural Radiance Fields [53.32527220134249]
The emergence of Neural Radiance Fields (NeRF) for novel view synthesis has increased interest in 3D scene editing.
Current methods face challenges such as time-consuming object labeling, limited capability to remove specific targets, and compromised rendering quality after removal.
This paper proposes a novel object-removing pipeline, named OR-NeRF, that can remove objects from 3D scenes with user-given points or text prompts on a single view.
arXiv Detail & Related papers (2023-05-17T18:18:05Z) - Multi-Plane Neural Radiance Fields for Novel View Synthesis [5.478764356647437]
Novel view synthesis is a long-standing problem that revolves around rendering frames of scenes from novel camera viewpoints.
In this work, we examine the performance, generalization, and efficiency of single-view multi-plane neural radiance fields.
We propose a new multiplane NeRF architecture that accepts multiple views to improve the synthesis results and expand the viewing range.
arXiv Detail & Related papers (2023-03-03T06:32:55Z) - Unsupervised Multi-View Object Segmentation Using Radiance Field
Propagation [55.9577535403381]
We present a novel approach to segmenting objects in 3D during reconstruction given only unlabeled multi-view images of a scene.
The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss.
To the best of our knowledge, RFP is the first unsupervised approach for tackling 3D scene object segmentation for neural radiance field (NeRF)
arXiv Detail & Related papers (2022-10-02T11:14:23Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - Objects are Different: Flexible Monocular 3D Object Detection [87.82253067302561]
We propose a flexible framework for monocular 3D object detection which explicitly decouples the truncated objects and adaptively combines multiple approaches for object depth estimation.
Experiments demonstrate that our method outperforms the state-of-the-art method by relatively 27% for the moderate level and 30% for the hard level in the test set of KITTI benchmark.
arXiv Detail & Related papers (2021-04-06T07:01:28Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.