Related papers: 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection

3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection

URL: http://arxiv.org/abs/2410.01647v1
Date: Wed, 2 Oct 2024 15:15:52 GMT
Title: 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection
Authors: Yang Cao, Yuanliang Jv, Dan Xu,
Abstract summary: This paper introduces 3DGS into 3DOD for the first time, identifying two main challenges. We propose an elegant and efficient solution by incorporating 2D Boundary Guidance. We also propose a Box-Focused Sampling strategy using 2D boxes to generate object probability distribution in 3D spaces.
Score: 12.14595005884025
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural Radiance Fields (NeRF) are widely used for novel-view synthesis and have been adapted for 3D Object Detection (3DOD), offering a promising approach to 3DOD through view-synthesis representation. However, NeRF faces inherent limitations: (i) limited representational capacity for 3DOD due to its implicit nature, and (ii) slow rendering speeds. Recently, 3D Gaussian Splatting (3DGS) has emerged as an explicit 3D representation that addresses these limitations. Inspired by these advantages, this paper introduces 3DGS into 3DOD for the first time, identifying two main challenges: (i) Ambiguous spatial distribution of Gaussian blobs: 3DGS primarily relies on 2D pixel-level supervision, resulting in unclear 3D spatial distribution of Gaussian blobs and poor differentiation between objects and background, which hinders 3DOD; (ii) Excessive background blobs: 2D images often include numerous background pixels, leading to densely reconstructed 3DGS with many noisy Gaussian blobs representing the background, negatively affecting detection. To tackle the challenge (i), we leverage the fact that 3DGS reconstruction is derived from 2D images, and propose an elegant and efficient solution by incorporating 2D Boundary Guidance to significantly enhance the spatial distribution of Gaussian blobs, resulting in clearer differentiation between objects and their background. To address the challenge (ii), we propose a Box-Focused Sampling strategy using 2D boxes to generate object probability distribution in 3D spaces, allowing effective probabilistic sampling in 3D to retain more object blobs and reduce noisy background blobs. Benefiting from our designs, our 3DGS-DET significantly outperforms the SOTA NeRF-based method, NeRF-Det, achieving improvements of +6.6 on mAP@0.25 and +8.1 on mAP@0.5 for the ScanNet dataset, and impressive +31.5 on mAP@0.25 for the ARKITScenes dataset.

Related papers

High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model [38.13429047918231]
We propose a novel hybrid Voxel-Gaussian representation, where a 3D voxel representation contains explicit 3D geometric information. Our 3D voxel representation is obtained by a fusion module that aligns RGB features and surface normal features, both of which can be estimated from 2D images.
arXiv Detail & Related papers (2025-04-02T08:58:34Z)
GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation [24.255633621887988]
We propose a method that leverages 2D diffusion models' implicit 3D reasoning ability while ensuring 3D consistency.<n>Specifically, the proposed Gaussian Splatting Decoder enforces 3D consistency by transforming SV3D latent outputs into an explicit 3D representation.<n>As a result, our approach simultaneously generates high-quality, multi-view-consistent images and accurate 3D models.
arXiv Detail & Related papers (2025-03-08T09:10:31Z)
DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets. Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z)
Neural Signed Distance Function Inference through Splatting 3D Gaussians Pulled on Zero-Level Set [49.780302894956776]
It is vital to infer a signed distance function (SDF) in multi-view based surface reconstruction. We propose a method that seamlessly merge 3DGS with the learning of neural SDFs. Our numerical and visual comparisons show our superiority over the state-of-the-art results on the widely used benchmarks.
arXiv Detail & Related papers (2024-10-18T05:48:06Z)
Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors [17.544733016978928]
3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild. Recent advancements in 3D object generation have introduced techniques that reconstruct an object's 3D shape and texture. We propose bridging the gap between 2D and 3D diffusion models to address this limitation.
arXiv Detail & Related papers (2024-10-12T10:14:11Z)
GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view. The model learns to generate 3D objects represented by sets of GS ellipsoids. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z)
3D-HGS: 3D Half-Gaussian Splatting [5.766096863155448]
Photo-realistic 3D Reconstruction is a fundamental problem in 3D computer vision. We propose to employ 3D Half-Gaussian (3D-HGS) kernels, which can be used as a plug-and-play kernel.
arXiv Detail & Related papers (2024-06-04T19:04:29Z)
DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly [18.655229356566785]
We present a differentiable rendering framework to learn structured 3D abstractions from sparse RGB images. By leveraging differentiable volume rendering, our method does not require 3D supervision. Our method demonstrates superior performance over state-of-the-art alternatives for 3D primitive abstraction from sparse views.
arXiv Detail & Related papers (2024-04-01T03:10:36Z)
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors [16.93758384693786]
Bidirectional Diffusion(BiDiff) is a unified framework that incorporates both a 3D and a 2D diffusion process. Our model achieves high-quality, diverse, and scalable 3D generation.
arXiv Detail & Related papers (2023-12-07T10:00:04Z)
StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances. First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss. Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z)
NeRF-GAN Distillation for Efficient 3D-Aware Generation with Convolutions [97.27105725738016]
integration of Neural Radiance Fields (NeRFs) and generative models, such as Generative Adversarial Networks (GANs) has transformed 3D-aware generation from single-view images. We propose a simple and effective method, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly generate 3D-consistent images corresponding to the underlying 3D representations.
arXiv Detail & Related papers (2023-03-22T18:59:48Z)
FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations. Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation. It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z)
Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image. Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space. We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.