Object Detection on Single Monocular Images through Canonical
Correlation Analysis
- URL: http://arxiv.org/abs/2002.05349v1
- Date: Thu, 13 Feb 2020 05:03:42 GMT
- Title: Object Detection on Single Monocular Images through Canonical
Correlation Analysis
- Authors: Zifan Yu and Suya You
- Abstract summary: We retrieve 3-D object information from single monocular images without using extra 3-D data like points cloud or depth images.
We propose a two-dimensional CCA framework to fuse monocular images and corresponding predicted depth images for basic computer vision tasks.
- Score: 3.4722706398428493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Without using extra 3-D data like points cloud or depth images for providing
3-D information, we retrieve the 3-D object information from single monocular
images. The high-quality predicted depth images are recovered from single
monocular images, and it is fed into the 2-D object proposal network with
corresponding monocular images. Most existing deep learning frameworks with
two-streams input data always fuse separate data by concatenating or adding,
which views every part of a feature map can contribute equally to the whole
task. However, when data are noisy, and too much information is redundant,
these methods no longer produce predictions or classifications efficiently. In
this report, we propose a two-dimensional CCA(canonical correlation analysis)
framework to fuse monocular images and corresponding predicted depth images for
basic computer vision tasks like image classification and object detection.
Firstly, we implemented different structures with one-dimensional CCA and
Alexnet to test the performance on the image classification task. And then, we
applied one of these structures with 2D-CCA for object detection. During these
experiments, we found that our proposed framework behaves better when taking
predicted depth images as inputs with the model trained from ground truth
depth.
Related papers
- Robust 3D Point Clouds Classification based on Declarative Defenders [18.51700931775295]
3D point clouds are unstructured and sparse, while 2D images are structured and dense.
In this paper, we explore three distinct algorithms for mapping 3D point clouds into 2D images.
The proposed approaches demonstrate superior accuracy and robustness against adversarial attacks.
arXiv Detail & Related papers (2024-10-13T01:32:38Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Perspective-aware Convolution for Monocular 3D Object Detection [2.33877878310217]
We propose a novel perspective-aware convolutional layer that captures long-range dependencies in images.
By enforcing convolutional kernels to extract features along the depth axis of every image pixel, we incorporates perspective information into network architecture.
We demonstrate improved performance on the KITTI3D dataset, achieving a 23.9% average precision in the easy benchmark.
arXiv Detail & Related papers (2023-08-24T17:25:36Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Depth-conditioned Dynamic Message Propagation for Monocular 3D Object
Detection [86.25022248968908]
We learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection.
We show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset.
arXiv Detail & Related papers (2021-03-30T16:20:24Z) - Multi-Stage CNN-Based Monocular 3D Vehicle Localization and Orientation
Estimation [0.0]
This paper aims to design a 3D object detection model from 2D images taken by monocular cameras by combining the estimated bird's-eye view elevation map and the deep representation of object features.
The proposed model has a pre-trained ResNet-50 network as its backend network and three more branches.
arXiv Detail & Related papers (2020-11-24T18:01:57Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - BirdNet+: End-to-End 3D Object Detection in LiDAR Bird's Eye View [117.44028458220427]
On-board 3D object detection in autonomous vehicles often relies on geometry information captured by LiDAR devices.
We present a fully end-to-end 3D object detection framework that can infer oriented 3D boxes solely from BEV images.
arXiv Detail & Related papers (2020-03-09T15:08:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.