A Dual-Cycled Cross-View Transformer Network for Unified Road Layout
Estimation and 3D Object Detection in the Bird's-Eye-View
- URL: http://arxiv.org/abs/2209.08844v1
- Date: Mon, 19 Sep 2022 08:43:38 GMT
- Title: A Dual-Cycled Cross-View Transformer Network for Unified Road Layout
Estimation and 3D Object Detection in the Bird's-Eye-View
- Authors: Curie Kim and Ue-Hwan Kim
- Abstract summary: We propose a unified model for road layout estimation and 3D object detection inspired by the transformer architecture and the CycleGAN learning framework.
We set up extensive learning scenarios to study the effect of multi-class learning for road layout estimation in various situations.
Experiment results attest the effectiveness of our model; we achieve state-of-the-art performance in both the road layout estimation and 3D object detection tasks.
- Score: 4.251500966181852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The bird's-eye-view (BEV) representation allows robust learning of multiple
tasks for autonomous driving including road layout estimation and 3D object
detection. However, contemporary methods for unified road layout estimation and
3D object detection rarely handle the class imbalance of the training dataset
and multi-class learning to reduce the total number of networks required. To
overcome these limitations, we propose a unified model for road layout
estimation and 3D object detection inspired by the transformer architecture and
the CycleGAN learning framework. The proposed model deals with the performance
degradation due to the class imbalance of the dataset utilizing the focal loss
and the proposed dual cycle loss. Moreover, we set up extensive learning
scenarios to study the effect of multi-class learning for road layout
estimation in various situations. To verify the effectiveness of the proposed
model and the learning scheme, we conduct a thorough ablation study and a
comparative study. The experiment results attest the effectiveness of our
model; we achieve state-of-the-art performance in both the road layout
estimation and 3D object detection tasks.
Related papers
- Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection [9.708971995966476]
This paper introduces a two-stage training strategy to address these challenges.
Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D.
We fine-tune the model on a combination of real-world datasets to enhance its adaptability to practical conditions.
arXiv Detail & Related papers (2024-08-28T08:44:58Z) - Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-View 3D Detection and Tracking [37.186306646752975]
We propose a unified object-aware temporal learning framework for multi-view 3D detection and tracking tasks.
The proposed model achieves consistent performance gains over baselines of different designs.
arXiv Detail & Related papers (2024-07-03T16:10:19Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Simultaneous Multiple Object Detection and Pose Estimation using 3D
Model Infusion with Monocular Vision [21.710141497071373]
Multiple object detection and pose estimation are vital computer vision tasks.
We propose simultaneous neural modeling of both using monocular vision and 3D model infusion.
Our Simultaneous Multiple Object detection and Pose Estimation network (SMOPE-Net) is an end-to-end trainable multitasking network.
arXiv Detail & Related papers (2022-11-21T05:18:56Z) - Stereo Neural Vernier Caliper [57.187088191829886]
We propose a new object-centric framework for learning-based stereo 3D object detection.
We tackle a problem of how to predict a refined update given an initial 3D cuboid guess.
Our approach achieves state-of-the-art performance on the KITTI benchmark.
arXiv Detail & Related papers (2022-03-21T14:36:07Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - Object Detection and Recognition of Swap-Bodies using Camera mounted on
a Vehicle [13.702911401489427]
This project aims to jointly perform object detection of a swap-body and to find the type of swap-body by reading an ILU code.
Recent research activities have drastically improved deep learning techniques which proves to enhance the field of computer vision.
arXiv Detail & Related papers (2020-04-17T08:49:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.