Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics
- URL: http://arxiv.org/abs/2211.10580v3
- Date: Wed, 12 Feb 2025 02:27:20 GMT
- Title: Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics
- Authors: Ancheng Lin, Jun Li, Yusheng Xiang, Wei Bian, Mukesh Prasad,
- Abstract summary: We introduce a multi-modal technique that leverages 3D point clouds and 2D colour images obtained from LiDAR and camera sensors for surface normal estimation.
We present a novel transformer-based neural network architecture that proficiently fuses visual semantic and 3D geometric information.
It has been verified that the proposed model can learn from a simulated 3D environment that mimics a traffic scene.
- Score: 7.507853813361308
- License:
- Abstract: High-quality surface normal can help improve geometry estimation in problems faced by autonomous vehicles, such as collision avoidance and occlusion inference. While a considerable volume of literature focuses on densely scanned indoor scenarios, normal estimation during autonomous driving remains an intricate problem due to the sparse, non-uniform, and noisy nature of real-world LiDAR scans. In this paper, we introduce a multi-modal technique that leverages 3D point clouds and 2D colour images obtained from LiDAR and camera sensors for surface normal estimation. We present the Hybrid Geometric Transformer (HGT), a novel transformer-based neural network architecture that proficiently fuses visual semantic and 3D geometric information. Furthermore, we developed an effective learning strategy for the multi-modal data. Experimental results demonstrate the superior effectiveness of our information fusion approach compared to existing methods. It has also been verified that the proposed model can learn from a simulated 3D environment that mimics a traffic scene. The learned geometric knowledge is transferable and can be applied to real-world 3D scenes in the KITTI dataset. Further tasks built upon the estimated normal vectors in the KITTI dataset show that the proposed estimator has an advantage over existing methods.
Related papers
- The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods [10.265865092323041]
This paper introduces a large-scale multi-modal dataset captured in and around well-known landmarks in Oxford.
We also establish benchmarks for tasks involving localisation, reconstruction, and novel-view synthesis.
Our dataset and benchmarks are intended to facilitate better integration of radiance field methods and SLAM systems.
arXiv Detail & Related papers (2024-11-15T19:43:24Z) - LLMI3D: MLLM-based 3D Perception from a Single 2D Image [77.13869413871028]
multimodal large language models (MLLMs) excel in general capacity but underperform in 3D tasks.
In this paper, we propose solutions for weak 3D local spatial object perception, poor text-based geometric numerical output, and inability to handle camera focal variations.
We employ parameter-efficient fine-tuning for a pre-trained MLLM and develop LLMI3D, a powerful 3D perception MLLM.
arXiv Detail & Related papers (2024-08-14T10:00:16Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - 3D Harmonic Loss: Towards Task-consistent and Time-friendly 3D Object
Detection on Edge for Intelligent Transportation System [28.55894241049706]
We propose a 3D harmonic loss function to relieve the pointcloud based inconsistent predictions.
Our proposed method considerably improves the performance than benchmark models.
Our code is open-source and publicly available.
arXiv Detail & Related papers (2022-11-07T10:11:48Z) - Uncertainty Guided Policy for Active Robotic 3D Reconstruction using
Neural Radiance Fields [82.21033337949757]
This paper introduces a ray-based volumetric uncertainty estimator, which computes the entropy of the weight distribution of the color samples along each ray of the object's implicit neural representation.
We show that it is possible to infer the uncertainty of the underlying 3D geometry given a novel view with the proposed estimator.
We present a next-best-view selection policy guided by the ray-based volumetric uncertainty in neural radiance fields-based representations.
arXiv Detail & Related papers (2022-09-17T21:28:57Z) - Large-Scale 3D Semantic Reconstruction for Automated Driving Vehicles
with Adaptive Truncated Signed Distance Function [9.414880946870916]
We propose a novel 3D reconstruction and semantic mapping system using LiDAR and camera sensors.
An Adaptive Truncated Function is introduced to describe surfaces implicitly, which can deal with different LiDAR point sparsities.
An optimal image patch selection strategy is proposed to estimate the optimal semantic class for each triangle mesh.
arXiv Detail & Related papers (2022-02-28T15:11:25Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection [40.34710686994996]
3D object detection has become an emerging task in autonomous driving scenarios.
Previous works process 3D point clouds using either projection-based or voxel-based models.
We propose the Stereo RGB and Deeper LIDAR framework which can utilize semantic and spatial information simultaneously.
arXiv Detail & Related papers (2020-06-09T11:19:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.