Related papers: DivAS: Interactive 3D Segmentation of NeRFs via Depth-Weighted Voxel Aggregation

DivAS: Interactive 3D Segmentation of NeRFs via Depth-Weighted Voxel Aggregation

URL: http://arxiv.org/abs/2601.04860v1
Date: Thu, 08 Jan 2026 11:53:04 GMT
Title: DivAS: Interactive 3D Segmentation of NeRFs via Depth-Weighted Voxel Aggregation
Authors: Ayush Pande,
Abstract summary: Existing methods for segmenting Neural Radiance Fields (NeRFs) are often optimization-based, requiring slow per-scene training that sacrifices the zero-shot capabilities of 2D foundation models.<n>We introduce DivAS, an optimization-free, fully interactive framework that addresses these limitations.<n>Our method operates via a fast GUI-based workflow where 2D SAM masks, generated from user point prompts, are refined using NeRF-derived depth priors to improve geometric accuracy and foreground separation.<n>The core of our contribution is a custom kernel that aggregates these refined multi-view masks into a unified 3D voxel grid in
Score: 1.1458853556386799
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing methods for segmenting Neural Radiance Fields (NeRFs) are often optimization-based, requiring slow per-scene training that sacrifices the zero-shot capabilities of 2D foundation models. We introduce DivAS (Depth-interactive Voxel Aggregation Segmentation), an optimization-free, fully interactive framework that addresses these limitations. Our method operates via a fast GUI-based workflow where 2D SAM masks, generated from user point prompts, are refined using NeRF-derived depth priors to improve geometric accuracy and foreground-background separation. The core of our contribution is a custom CUDA kernel that aggregates these refined multi-view masks into a unified 3D voxel grid in under 200ms, enabling real-time visual feedback. This optimization-free design eliminates the need for per-scene training. Experiments on Mip-NeRF 360° and LLFF show that DivAS achieves segmentation quality comparable to optimization-based methods, while being 2-2.5x faster end-to-end, and up to an order of magnitude faster when excluding user prompting time.

Related papers

PointGauss: Point Cloud-Guided Multi-Object Segmentation for Gaussian Splatting [18.042769428774676]
We introduce PointGauss, a novel point cloud-guided framework for real-time multi-object segmentation in Gaussian Splatting representations.<n>The key innovation lies in two aspects: (1) a point cloud-based Gaussian primitive decoder that generates 3D instance masks within 1 minute, and (2) a GPU-accelerated 2D mask rendering system that ensures multi-view consistency.
arXiv Detail & Related papers (2025-08-01T01:56:54Z)
LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering [75.67501939005119]
We present a novel level-of-detail (LOD) method for 3D Gaussian Splatting on memory-constrained devices.<n>Our approach iteratively selects optimal subsets of Gaussians based on camera distance.<n>Our method achieves state-of-the-art performance on both outdoor (Hierarchical 3DGS) and indoor (Zip-NeRF) datasets.
arXiv Detail & Related papers (2025-05-29T06:50:57Z)
Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering [37.48219196092378]
We propose an efficient radiance field rendering algorithm that incorporates a synthesis process on adaptive sparse voxels without neural networks or 3D Gaussians.<n>Our method improves the previous neural-free voxel model by over 4db PSNR and more than 10x FPS speedup.<n>Our voxel representation is seamlessly compatible with grid-based 3D processing techniques such as Volume Fusion, Voxel Pooling, and Marching Cubes.
arXiv Detail & Related papers (2024-12-05T18:59:11Z)
OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts. Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module. Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z)
MicroDreamer: Efficient 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction [37.07128043394227]
This paper introduces score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to reduce the NFEs. We present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks.
arXiv Detail & Related papers (2024-04-30T12:56:14Z)
NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning. We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision. The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z)
VoxNeRF: Bridging Voxel Representation and Neural Radiance Fields for Enhanced Indoor View Synthesis [73.50359502037232]
VoxNeRF is a novel approach to enhance the quality and efficiency of neural indoor reconstruction and novel view synthesis.<n>We propose an efficient voxel-guided sampling technique that allocates computational resources to selectively the most relevant segments of rays.<n>Our approach is validated with extensive experiments on ScanNet and ScanNet++.
arXiv Detail & Related papers (2023-11-09T11:32:49Z)
Fast-SNARF: A Fast Deformer for Articulated Neural Fields [92.68788512596254]
We propose a new articulation module for neural fields, Fast-SNARF, which finds accurate correspondences between canonical space and posed space. Fast-SNARF is a drop-in replacement in to our previous work, SNARF, while significantly improving its computational efficiency. Because learning of deformation maps is a crucial component in many 3D human avatar methods, we believe that this work represents a significant step towards the practical creation of 3D virtual humans.
arXiv Detail & Related papers (2022-11-28T17:55:34Z)
Neural Deformable Voxel Grid for Fast Optimization of Dynamic View Synthesis [63.25919018001152]
We propose a fast deformable radiance field method to handle dynamic scenes. Our method achieves comparable performance to D-NeRF using only 20 minutes for training.
arXiv Detail & Related papers (2022-06-15T17:49:08Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.