Exploring Data Augmentation for Multi-Modality 3D Object Detection
- URL: http://arxiv.org/abs/2012.12741v2
- Date: Wed, 21 Apr 2021 16:23:20 GMT
- Title: Exploring Data Augmentation for Multi-Modality 3D Object Detection
- Authors: Wenwei Zhang, Zhe Wang, Chen Change Loy
- Abstract summary: It is counter-intuitive that multi-modality methods based on point cloud and images perform only marginally better or sometimes worse than approaches that solely use point cloud.
We propose a pipeline, named transformation flow, to bridge the gap between single and multi-modality data augmentation with transformation reversing and replaying.
Our method also wins the best PKL award in the 3rd nuScenes detection challenge.
- Score: 82.9988604088494
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: It is counter-intuitive that multi-modality methods based on point cloud and
images perform only marginally better or sometimes worse than approaches that
solely use point cloud. This paper investigates the reason behind this
phenomenon. Due to the fact that multi-modality data augmentation must maintain
consistency between point cloud and images, recent methods in this field
typically use relatively insufficient data augmentation. This shortage makes
their performance under expectation. Therefore, we contribute a pipeline, named
transformation flow, to bridge the gap between single and multi-modality data
augmentation with transformation reversing and replaying. In addition,
considering occlusions, a point in different modalities may be occupied by
different objects, making augmentations such as cut and paste non-trivial for
multi-modality detection. We further present Multi-mOdality Cut and pAste
(MoCa), which simultaneously considers occlusion and physical plausibility to
maintain the multi-modality consistency. Without using ensemble of detectors,
our multi-modality detector achieves new state-of-the-art performance on
nuScenes dataset and competitive performance on KITTI 3D benchmark. Our method
also wins the best PKL award in the 3rd nuScenes detection challenge. Code and
models will be released at https://github.com/open-mmlab/mmdetection3d.
Related papers
- Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.
Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction.
We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping [12.442574943138794]
The paper explores the industrial multimodal Anomaly Detection (AD) task, which exploits point clouds and RGB images to localize anomalies.
We introduce a novel light and fast framework that learns to map features from one modality to the other on nominal samples.
arXiv Detail & Related papers (2023-12-07T18:41:21Z) - MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving [15.36416000750147]
We propose a multi-modal 3D semantic segmentation model (MSeg3D) with joint intra-modal feature extraction and inter-modal feature fusion.
MSeg3D still shows robustness and improves the LiDAR-only baseline.
arXiv Detail & Related papers (2023-03-15T13:13:03Z) - Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme.
Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z) - Interactive Multi-scale Fusion of 2D and 3D Features for Multi-object
Tracking [23.130490413184596]
We introduce PointNet++ to obtain multi-scale deep representations of point cloud to make it adaptive to our proposed Interactive Feature Fusion.
Our method can achieve good performance on the KITTI benchmark and outperform other approaches without using multi-scale feature fusion.
arXiv Detail & Related papers (2022-03-30T13:00:27Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - Segment as Points for Efficient Online Multi-Object Tracking and
Segmentation [66.03023110058464]
We propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation.
Our method generates a new tracking-by-points paradigm where discriminative instance embeddings are learned from randomly selected points rather than images.
The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods by large margins.
arXiv Detail & Related papers (2020-07-03T08:29:35Z) - VMLoc: Variational Fusion For Learning-Based Multimodal Camera
Localization [46.607930208613574]
We propose an end-to-end framework, termed VMLoc, to fuse different sensor inputs into a common latent space.
Unlike previous multimodal variational works directly adapting the objective function of vanilla variational auto-encoder, we show how camera localization can be accurately estimated.
arXiv Detail & Related papers (2020-03-12T14:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.