Exploring 2D Data Augmentation for 3D Monocular Object Detection
- URL: http://arxiv.org/abs/2104.10786v1
- Date: Wed, 21 Apr 2021 22:43:42 GMT
- Title: Exploring 2D Data Augmentation for 3D Monocular Object Detection
- Authors: Sugirtha T, Sridevi M, Khailash Santhakumar, B Ravi Kiran, Thomas
Gauthier and Senthil Yogamani
- Abstract summary: Many standard 2D object detection data augmentation techniques do not extend to 3D box.
We propose two novel augmentations for monocular 3D detection without a requirement for novel view synthesis.
- Score: 0.2936007114555107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation is a key component of CNN based image recognition tasks
like object detection. However, it is relatively less explored for 3D object
detection. Many standard 2D object detection data augmentation techniques do
not extend to 3D box. Extension of these data augmentations for 3D object
detection requires adaptation of the 3D geometry of the input scene and
synthesis of new viewpoints. This requires accurate depth information of the
scene which may not be always available. In this paper, we evaluate existing 2D
data augmentations and propose two novel augmentations for monocular 3D
detection without a requirement for novel view synthesis. We evaluate these
augmentations on the RTM3D detection model firstly due to the shorter training
times . We obtain a consistent improvement by 4% in the 3D AP (@IoU=0.7) for
cars, ~1.8% scores 3D AP (@IoU=0.25) for pedestrians & cyclists, over the
baseline on KITTI car detection dataset. We also demonstrate a rigorous
evaluation of the mAP scores by re-weighting them to take into account the
class imbalance in the KITTI validation dataset.
Related papers
- 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Every Dataset Counts: Scaling up Monocular 3D Object Detection with Joint Datasets Training [9.272389295055271]
This study investigates the pipeline for training a monocular 3D object detection model on a diverse collection of 3D and 2D datasets.
The proposed framework comprises three components: (1) a robust monocular 3D model capable of functioning across various camera settings, (2) a selective-training strategy to accommodate datasets with differing class annotations, and (3) a pseudo 3D training approach using 2D labels to enhance detection performance in scenes containing only 2D labels.
arXiv Detail & Related papers (2023-10-02T06:17:24Z) - LidarAugment: Searching for Scalable 3D LiDAR Data Augmentations [55.45435708426761]
LidarAugment is a search-based data augmentation strategy for 3D object detection.
We show LidarAugment can be customized for different model architectures.
It consistently improves convolution-based UPillars/StarNet/RSN and transformer-based SWFormer.
arXiv Detail & Related papers (2022-10-24T18:00:04Z) - Real3D-Aug: Point Cloud Augmentation by Placing Real Objects with
Occlusion Handling for 3D Detection and Segmentation [0.0]
We propose a data augmentation method that takes advantage of already annotated data multiple times.
We propose an augmentation framework that reuses real data, automatically finds suitable placements in the scene to be augmented.
The pipeline proves competitive in training top-performing models for 3D object detection and semantic segmentation.
arXiv Detail & Related papers (2022-06-15T16:25:30Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - Learning to Predict the 3D Layout of a Scene [0.3867363075280544]
We propose a method that only uses a single RGB image, thus enabling applications in devices or vehicles that do not have LiDAR sensors.
We use the KITTI dataset for training, which consists of street traffic scenes with class labels, 2D bounding boxes and 3D annotations with seven degrees of freedom.
We achieve a mean average precision of 47.3% for moderately difficult data, measured at a 3D intersection over union threshold of 70%, as required by the official KITTI benchmark; outperforming previous state-of-the-art single RGB only methods by a large margin.
arXiv Detail & Related papers (2020-11-19T17:23:30Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.