Smaller3d: Smaller Models for 3D Semantic Segmentation Using Minkowski
Engine and Knowledge Distillation Methods
- URL: http://arxiv.org/abs/2305.03188v1
- Date: Thu, 4 May 2023 22:19:25 GMT
- Title: Smaller3d: Smaller Models for 3D Semantic Segmentation Using Minkowski
Engine and Knowledge Distillation Methods
- Authors: Alen Adamyan and Erik Harutyunyan
- Abstract summary: This paper proposes the application of knowledge distillation techniques, especially for sparse tensors in 3D deep learning, to reduce model sizes while maintaining performance.
We analyze and purpose different loss functions, including standard methods and combinations of various losses, to simulate the performance of state-of-the-art models of different Sparse Convolutional NNs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There are various optimization techniques in the realm of 3D, including point
cloud-based approaches that use mesh, texture, and voxels which optimize how
you store, and how do calculate in 3D. These techniques employ methods such as
feed-forward networks, 3D convolutions, graph neural networks, transformers,
and sparse tensors. However, the field of 3D is one of the most computationally
expensive fields, and these methods have yet to achieve their full potential
due to their large capacity, complexity, and computation limits. This paper
proposes the application of knowledge distillation techniques, especially for
sparse tensors in 3D deep learning, to reduce model sizes while maintaining
performance. We analyze and purpose different loss functions, including
standard methods and combinations of various losses, to simulate the
performance of state-of-the-art models of different Sparse Convolutional NNs.
Our experiments are done on the standard ScanNet V2 dataset, and we achieved
around 2.6\% mIoU difference with a 4 times smaller model and around 8\% with a
16 times smaller model on the latest state-of-the-art spacio-temporal convents
based models.
Related papers
- Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis? [4.817356884702073]
We present several novel techniques for implementing 3D convolutional blocks using 2D and/or 1D convolutions with only 4D and/or 3D tensors.
Our motivation is that 3D convolutions with 5D tensors are computationally expensive and they may not be supported by some of the edge devices used in real-time applications such as robots.
arXiv Detail & Related papers (2024-07-23T14:30:51Z) - DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts.
Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets.
We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - Geometry-Informed Neural Operator for Large-Scale 3D PDEs [76.06115572844882]
We propose the geometry-informed neural operator (GINO) to learn the solution operator of large-scale partial differential equations.
We successfully trained GINO to predict the pressure on car surfaces using only five hundred data points.
arXiv Detail & Related papers (2023-09-01T16:59:21Z) - V4d: voxel for 4d novel view synthesis [21.985228924523543]
We utilize 3D Voxel to model the 4D neural radiance field, short as V4D, where the 3D voxel has two formats.
The proposed LUTs-based refinement module achieves the performance gain with little computational cost.
arXiv Detail & Related papers (2022-05-28T04:45:07Z) - Focal Sparse Convolutional Networks for 3D Object Detection [121.45950754511021]
We introduce two new modules to enhance the capability of Sparse CNNs.
They are focal sparse convolution (Focals Conv) and its multi-modal variant of focal sparse convolution with fusion.
For the first time, we show that spatially learnable sparsity in sparse convolution is essential for sophisticated 3D object detection.
arXiv Detail & Related papers (2022-04-26T17:34:10Z) - Fast mesh denoising with data driven normal filtering using deep
variational autoencoders [6.25118865553438]
We propose a fast and robust denoising method for dense 3D scanned industrial models.
The proposed approach employs conditional variational autoencoders to effectively filter face normals.
For 3D models with more than 1e4 faces, the presented pipeline is twice as fast as methods with equivalent reconstruction error.
arXiv Detail & Related papers (2021-11-24T20:25:15Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Point Transformer for Shape Classification and Retrieval of 3D and ALS
Roof PointClouds [3.3744638598036123]
This paper proposes a fully attentional model - em Point Transformer, for deriving a rich point cloud representation.
The model's shape classification and retrieval performance are evaluated on a large-scale urban dataset - RoofN3D and a standard benchmark dataset ModelNet40.
The proposed method outperforms other state-of-the-art models in the RoofN3D dataset, gives competitive results in the ModelNet40 benchmark, and showcases high robustness to various unseen point corruptions.
arXiv Detail & Related papers (2020-11-08T08:11:02Z) - Learning Deformable Tetrahedral Meshes for 3D Reconstruction [78.0514377738632]
3D shape representations that accommodate learning-based 3D reconstruction are an open problem in machine learning and computer graphics.
Previous work on neural 3D reconstruction demonstrated benefits, but also limitations, of point cloud, voxel, surface mesh, and implicit function representations.
We introduce Deformable Tetrahedral Meshes (DefTet) as a particular parameterization that utilizes volumetric tetrahedral meshes for the reconstruction problem.
arXiv Detail & Related papers (2020-11-03T02:57:01Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.