OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving
- URL: http://arxiv.org/abs/2102.07448v3
- Date: Tue, 6 Jun 2023 14:31:21 GMT
- Title: OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving
- Authors: Varun Ravi Kumar, Senthil Yogamani, Hazem Rashed, Ganesh Sistu,
Christian Witt, Isabelle Leang, Stefan Milz and Patrick M\"ader
- Abstract summary: This work presents a multi-task visual perception network on unrectified fisheye images.
It consists of six primary tasks necessary for an autonomous driving system.
We demonstrate that the jointly trained model performs better than the respective single task versions.
- Score: 10.3540046389057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surround View fisheye cameras are commonly deployed in automated driving for
360\deg{} near-field sensing around the vehicle. This work presents a
multi-task visual perception network on unrectified fisheye images to enable
the vehicle to sense its surrounding environment. It consists of six primary
tasks necessary for an autonomous driving system: depth estimation, visual
odometry, semantic segmentation, motion segmentation, object detection, and
lens soiling detection. We demonstrate that the jointly trained model performs
better than the respective single task versions. Our multi-task model has a
shared encoder providing a significant computational advantage and has
synergized decoders where tasks support each other. We propose a novel camera
geometry based adaptation mechanism to encode the fisheye distortion model both
at training and inference. This was crucial to enable training on the WoodScape
dataset, comprised of data from different parts of the world collected by 12
different cameras mounted on three different cars with different intrinsics and
viewpoints. Given that bounding boxes is not a good representation for
distorted fisheye images, we also extend object detection to use a polygon with
non-uniformly sampled vertices. We additionally evaluate our model on standard
automotive datasets, namely KITTI and Cityscapes. We obtain the
state-of-the-art results on KITTI for depth estimation and pose estimation
tasks and competitive performance on the other tasks. We perform extensive
ablation studies on various architecture choices and task weighting
methodologies. A short video at https://youtu.be/xbSjZ5OfPes provides
qualitative results.
Related papers
- Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation
Learning of Vision-based Autonomous Driving [73.3702076688159]
We propose a novel contrastive learning algorithm, Cohere3D, to learn coherent instance representations in a long-term input sequence.
We evaluate our algorithm by finetuning the pretrained model on various downstream perception, prediction, and planning tasks.
arXiv Detail & Related papers (2024-02-23T19:43:01Z) - Linking vision and motion for self-supervised object-centric perception [16.821130222597155]
Object-centric representations enable autonomous driving algorithms to reason about interactions between many independent agents and scene features.
Traditionally these representations have been obtained via supervised learning, but this decouples perception from the downstream driving task and could harm generalization.
We adapt a self-supervised object-centric vision model to perform object decomposition using only RGB video and the pose of the vehicle as inputs.
arXiv Detail & Related papers (2023-07-14T04:21:05Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Disentangling and Vectorization: A 3D Visual Perception Approach for
Autonomous Driving Based on Surround-View Fisheye Cameras [3.485767750936058]
Multidimensional Vector is proposed to include the utilizable information generated in different dimensions and stages.
The experiments of real fisheye images demonstrate that our solution achieves state-of-the-art accuracy while being real-time in practice.
arXiv Detail & Related papers (2021-07-19T13:24:21Z) - SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround
View Fisheye Cameras [30.480562747903186]
A 360deg perception of scene geometry is essential for automated driving, notably for parking and urban driving scenarios.
We present novel camera-geometry adaptive multi-scale convolutions which utilize the camera parameters as a conditional input.
We evaluate our approach on the Fisheye WoodScape surround-view dataset, significantly improving over previous approaches.
arXiv Detail & Related papers (2021-04-09T15:20:20Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Generalized Object Detection on Fisheye Cameras for Autonomous Driving:
Dataset, Representations and Baseline [5.1450366450434295]
We explore better representations like oriented bounding box, ellipse, and generic polygon for object detection in fisheye images.
We design a novel curved bounding box model that has optimal properties for fisheye distortion models.
It is the first detailed study on object detection on fisheye cameras for autonomous driving scenarios.
arXiv Detail & Related papers (2020-12-03T18:00:16Z) - MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous
Driving Using Multiple Views [60.538802124885414]
We present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation.
MVLidarNet is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input.
We show results on both KITTI and a much larger internal dataset, thus demonstrating the method's ability to scale by an order of magnitude.
arXiv Detail & Related papers (2020-06-09T21:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.