fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence
- URL: http://arxiv.org/abs/2407.01781v1
- Date: Mon, 1 Jul 2024 20:20:33 GMT
- Title: fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence
- Authors: Francis Williams, Jiahui Huang, Jonathan Swartz, Gergely Klár, Vijay Thakkar, Matthew Cong, Xuanchi Ren, Ruilong Li, Clement Fuji-Tsang, Sanja Fidler, Eftychios Sifakis, Ken Museth,
- Abstract summary: fVDB is a novel framework for deep learning on large-scale 3D data.
Our framework is fully integrated with PyTorch enabling interoperability with existing pipelines.
- Score: 50.417261057533786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present fVDB, a novel GPU-optimized framework for deep learning on large-scale 3D data. fVDB provides a complete set of differentiable primitives to build deep learning architectures for common tasks in 3D learning such as convolution, pooling, attention, ray-tracing, meshing, etc. fVDB simultaneously provides a much larger feature set (primitives and operators) than established frameworks with no loss in efficiency: our operators match or exceed the performance of other frameworks with narrower scope. Furthermore, fVDB can process datasets with much larger footprint and spatial resolution than prior works, while providing a competitive memory footprint on small inputs. To achieve this combination of versatility and performance, fVDB relies on a single novel VDB index grid acceleration structure paired with several key innovations including GPU accelerated sparse grid construction, convolution using tensorcores, fast ray tracing kernels using a Hierarchical Digital Differential Analyzer algorithm (HDDA), and jagged tensors. Our framework is fully integrated with PyTorch enabling interoperability with existing pipelines, and we demonstrate its effectiveness on a number of representative tasks such as large-scale point-cloud segmentation, high resolution 3D generative modeling, unbounded scale Neural Radiance Fields, and large-scale point cloud reconstruction.
Related papers
- N-BVH: Neural ray queries with bounding volume hierarchies [51.430495562430565]
In 3D computer graphics, the bulk of a scene's memory usage is due to polygons and textures.
We devise N-BVH, a neural compression architecture designed to answer arbitrary ray queries in 3D.
Our method provides faithful approximations of visibility, depth, and appearance attributes.
arXiv Detail & Related papers (2024-05-25T13:54:34Z) - Fast Sparse 3D Convolution Network with VDB [2.834312349049142]
We proposed a new Convolution Neural Network implementation optimized for sparse 3D data inference.
This implementation uses NanoVDB as the data structure to store the sparse tensor.
We demonstrate that this architecture is around 20 times faster than the state-of-the-art dense CNN model on a high-resolution 3D object classification network.
arXiv Detail & Related papers (2023-11-05T20:43:46Z) - SpVOS: Efficient Video Object Segmentation with Triple Sparse
Convolution [18.332130780309797]
This work develops a novel triple sparse convolution to reduce the computation costs of the overall video object segmentation framework.
Experiments are conducted on two mainstream VOS datasets, including DAVIS and Youtube-VOS.
Results show that, the proposed SpVOS achieves superior performance over other state-of-the-art sparse methods, and even maintains comparable performance.
arXiv Detail & Related papers (2023-10-23T17:21:33Z) - BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy [58.92659367605442]
We present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information.
We show that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters and computational overhead.
arXiv Detail & Related papers (2023-05-26T11:16:12Z) - HKNAS: Classification of Hyperspectral Imagery Based on Hyper Kernel
Neural Architecture Search [104.45426861115972]
We propose to directly generate structural parameters by utilizing the specifically designed hyper kernels.
We obtain three kinds of networks to separately conduct pixel-level or image-level classifications with 1-D or 3-D convolutions.
A series of experiments on six public datasets demonstrate that the proposed methods achieve state-of-the-art results.
arXiv Detail & Related papers (2023-04-23T17:27:40Z) - DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets [95.84755169585492]
We present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception.
Our model achieves state-of-the-art performance with a broad range of 3D perception tasks.
arXiv Detail & Related papers (2023-01-15T09:31:58Z) - NIO: Lightweight neural operator-based architecture for video frame
interpolation [15.875579519177487]
NIO is a lightweight, efficient neural operator-based architecture to perform video frame-by-frame learning.
We show that NIO can produce visually-smooth and accurate results and converges in fewer epochs than state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-19T20:30:47Z) - 3D Point Cloud Registration with Multi-Scale Architecture and
Self-supervised Fine-tuning [5.629161809575013]
MS-SVConv is a fast multi-scale deep neural network that outputs features from point clouds for 3D registration between two scenes.
We show significant improvements compared to state-of-the-art methods on the competitive and well-known 3DMatch benchmark.
We present a strategy to fine-tune MS-SVConv on unknown datasets in a self-supervised way, which leads to state-of-the-art results on ETH and TUM datasets.
arXiv Detail & Related papers (2021-03-26T15:38:33Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - Generative Sparse Detection Networks for 3D Single-shot Object Detection [43.91336826079574]
3D object detection has been widely studied due to its potential applicability to many promising areas such as robotics and augmented reality.
Yet, the sparse nature of the 3D data poses unique challenges to this task.
We propose Generative Sparse Detection Network (GSDN), a fully-convolutional single-shot sparse detection network.
arXiv Detail & Related papers (2020-06-22T15:54:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.