Robust 2D/3D Vehicle Parsing in CVIS
- URL: http://arxiv.org/abs/2103.06432v1
- Date: Thu, 11 Mar 2021 03:35:05 GMT
- Title: Robust 2D/3D Vehicle Parsing in CVIS
- Authors: Hui Miao, Feixiang Lu, Zongdai Liu, Liangjun Zhang, Dinesh Manocha,
Bin Zhou
- Abstract summary: We present a novel approach to robustly detect and perceive vehicles in different camera views as part of a cooperative vehicle-infrastructure system (CVIS)
Our formulation is designed for arbitrary camera views and makes no assumptions about intrinsic or extrinsic parameters.
In practice, our approach outperforms SOTA methods on 2D detection, instance segmentation, and 6-DoF pose estimation.
- Score: 54.825777404511605
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present a novel approach to robustly detect and perceive vehicles in
different camera views as part of a cooperative vehicle-infrastructure system
(CVIS). Our formulation is designed for arbitrary camera views and makes no
assumptions about intrinsic or extrinsic parameters. First, to deal with
multi-view data scarcity, we propose a part-assisted novel view synthesis
algorithm for data augmentation. We train a part-based texture inpainting
network in a self-supervised manner. Then we render the textured model into the
background image with the target 6-DoF pose. Second, to handle various camera
parameters, we present a new method that produces dense mappings between image
pixels and 3D points to perform robust 2D/3D vehicle parsing. Third, we build
the first CVIS dataset for benchmarking, which annotates more than 1540 images
(14017 instances) from real-world traffic scenarios. We combine these novel
algorithms and datasets to develop a robust approach for 2D/3D vehicle parsing
for CVIS. In practice, our approach outperforms SOTA methods on 2D detection,
instance segmentation, and 6-DoF pose estimation, by 4.5%, 4.3%, and 2.9%,
respectively. More details and results are included in the supplement. To
facilitate future research, we will release the source code and the dataset on
GitHub.
Related papers
- Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data [80.14669385741202]
We propose a self-supervised pre-training method for 3D perception models tailored to autonomous driving data.
We leverage the availability of synchronized and calibrated image and Lidar sensors in autonomous driving setups.
Our method does not require any point cloud nor image annotations.
arXiv Detail & Related papers (2022-03-30T12:40:30Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - Pose Estimation of Specific Rigid Objects [0.7931904787652707]
We address the problem of estimating the 6D pose of rigid objects from a single RGB or RGB-D input image.
This problem is of great importance to many application fields such as robotic manipulation, augmented reality, and autonomous driving.
arXiv Detail & Related papers (2021-12-30T14:36:47Z) - To the Point: Efficient 3D Object Detection in the Range Image with
Graph Convolution Kernels [30.3378171262436]
We design a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network.
Our method performs competitively on the Open dataset and improves the state-of-the-art AP for pedestrian detection from 69.7% to 75.5%.
It is also efficient in that our smallest model, which still outperforms the popular PointPillars in quality, requires 180 times fewer FLOPS and model parameters.
arXiv Detail & Related papers (2021-06-25T01:27:26Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - Single-Shot 3D Detection of Vehicles from Monocular RGB Images via
Geometry Constrained Keypoints in Real-Time [6.82446891805815]
We propose a novel 3D single-shot object detection method for detecting vehicles in monocular RGB images.
Our approach lifts 2D detections to 3D space by predicting additional regression and classification parameters.
We test our approach on different datasets for autonomous driving and evaluate it using the challenging KITTI 3D Object Detection and the novel nuScenes Object Detection benchmarks.
arXiv Detail & Related papers (2020-06-23T15:10:19Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z) - Object Detection on Single Monocular Images through Canonical
Correlation Analysis [3.4722706398428493]
We retrieve 3-D object information from single monocular images without using extra 3-D data like points cloud or depth images.
We propose a two-dimensional CCA framework to fuse monocular images and corresponding predicted depth images for basic computer vision tasks.
arXiv Detail & Related papers (2020-02-13T05:03:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.