2DDATA: 2D Detection Annotations Transmittable Aggregation for Semantic
Segmentation on Point Cloud
- URL: http://arxiv.org/abs/2309.11755v1
- Date: Thu, 21 Sep 2023 03:32:22 GMT
- Title: 2DDATA: 2D Detection Annotations Transmittable Aggregation for Semantic
Segmentation on Point Cloud
- Authors: Guan-Cheng Lee
- Abstract summary: Inherit from the previous works, we not only fuse the information from multi-modality without above issues, and also exhaust the information in the RGB modality.
We demonstrate that our simple design can transmit bounding box prior information to the 3D model encoder, proving the feasibility of large multi-modality models fused with modality-specific data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, multi-modality models have been introduced because of the
complementary information from different sensors such as LiDAR and cameras. It
requires paired data along with precise calibrations for all modalities, the
complicated calibration among modalities hugely increases the cost of
collecting such high-quality datasets, and hinder it from being applied to
practical scenarios. Inherit from the previous works, we not only fuse the
information from multi-modality without above issues, and also exhaust the
information in the RGB modality. We introduced the 2D Detection Annotations
Transmittable Aggregation(\textbf{2DDATA}), designing a data-specific branch,
called \textbf{Local Object Branch}, which aims to deal with points in a
certain bounding box, because of its easiness of acquiring 2D bounding box
annotations. We demonstrate that our simple design can transmit bounding box
prior information to the 3D encoder model, proving the feasibility of large
multi-modality models fused with modality-specific data.
Related papers
- Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object
Detection [55.210991151015534]
We present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection.
Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective.
arXiv Detail & Related papers (2024-01-10T08:56:07Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - LWSIS: LiDAR-guided Weakly Supervised Instance Segmentation for
Autonomous Driving [34.119642131912485]
We present a more artful framework, LiDAR-guided Weakly Supervised Instance (LWSIS)
LWSIS uses the off-the-shelf 3D data, i.e., Point Cloud, together with the 3D boxes, as natural weak supervisions for training the 2D image instance segmentation models.
Our LWSIS not only exploits the complementary information in multimodal data during training, but also significantly reduces the cost of the dense 2D masks.
arXiv Detail & Related papers (2022-12-07T08:08:01Z) - CL3D: Unsupervised Domain Adaptation for Cross-LiDAR 3D Detection [16.021932740447966]
Domain adaptation for Cross-LiDAR 3D detection is challenging due to the large gap on the raw data representation.
We present an unsupervised domain adaptation method that overcomes above difficulties.
arXiv Detail & Related papers (2022-12-01T03:22:55Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object
Detection [32.06145370498289]
We propose Contrastively Augmented Transformer for multi-modal 3D object Detection (CAT-Det)
CAT-Det adopts a two-stream structure consisting of a Pointformer (PT) branch, an Imageformer (IT) branch along with a Cross-Modal Transformer (CMT) module.
We propose an effective One-way Multi-modal Data Augmentation (OMDA) approach via hierarchical contrastive learning at both the point and object levels.
arXiv Detail & Related papers (2022-04-01T10:07:25Z) - Dense Voxel Fusion for 3D Object Detection [10.717415797194896]
Voxel Fusion (DVF) is a sequential fusion method that generates multi-scale dense voxel feature representations.
We train directly with ground truth 2D bounding box labels, avoiding noisy, detector-specific, 2D predictions.
We show that our proposed multi-modal training strategy results in better generalization compared to training using erroneous 2D predictions.
arXiv Detail & Related papers (2022-03-02T04:51:31Z) - Lifting 2D Object Locations to 3D by Discounting LiDAR Outliers across
Objects and Views [70.1586005070678]
We present a system for automatically converting 2D mask object predictions and raw LiDAR point clouds into full 3D bounding boxes of objects.
Our method significantly outperforms previous work despite the fact that those methods use significantly more complex pipelines, 3D models and additional human-annotated external sources of prior information.
arXiv Detail & Related papers (2021-09-16T13:01:13Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.