YOLOP: You Only Look Once for Panoptic Driving Perception
- URL: http://arxiv.org/abs/2108.11250v3
- Date: Fri, 27 Aug 2021 06:31:48 GMT
- Title: YOLOP: You Only Look Once for Panoptic Driving Perception
- Authors: Dong Wu, Manwen Liao, Weitian Zhang, Xinggang Wang
- Abstract summary: We present a panoptic driving perception network (YOLOP) to perform traffic object detection, drivable area segmentation and lane detection simultaneously.
It is composed of one encoder for feature extraction and three decoders to handle the specific tasks.
Our model performs extremely well on the challenging BDD100K dataset, achieving state-of-the-art on all three tasks in terms of accuracy and speed.
- Score: 21.802146960999394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A panoptic driving perception system is an essential part of autonomous
driving. A high-precision and real-time perception system can assist the
vehicle in making the reasonable decision while driving. We present a panoptic
driving perception network (YOLOP) to perform traffic object detection,
drivable area segmentation and lane detection simultaneously. It is composed of
one encoder for feature extraction and three decoders to handle the specific
tasks. Our model performs extremely well on the challenging BDD100K dataset,
achieving state-of-the-art on all three tasks in terms of accuracy and speed.
Besides, we verify the effectiveness of our multi-task learning model for joint
training via ablative studies. To our best knowledge, this is the first work
that can process these three visual perception tasks simultaneously in
real-time on an embedded device Jetson TX2(23 FPS) and maintain excellent
accuracy. To facilitate further research, the source codes and pre-trained
models will be released at https://github.com/hustvl/YOLOP.
Related papers
- Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference [43.474068248379815]
We propose a shared encoder trained on multiple computer vision tasks critical for urban navigation.
We introduce a multi-scale feature network for pose estimation to improve depth learning.
Our findings demonstrate that a shared backbone trained on diverse visual tasks is capable of providing overall perception capabilities.
arXiv Detail & Related papers (2024-09-16T08:54:03Z) - Improving automatic detection of driver fatigue and distraction using
machine learning [0.0]
Driver fatigue and distracted driving are important factors in traffic accidents.
We present techniques for simultaneously detecting fatigue and distracted driving behaviors using vision-based and machine learning-based approaches.
arXiv Detail & Related papers (2024-01-04T06:33:46Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - JPerceiver: Joint Perception Network for Depth, Pose and Layout
Estimation in Driving Scenes [75.20435924081585]
JPerceiver can simultaneously estimate scale-aware depth and VO as well as BEV layout from a monocular video sequence.
It exploits the cross-view geometric transformation (CGT) to propagate the absolute scale from the road layout to depth and VO.
Experiments on Argoverse, Nuscenes and KITTI show the superiority of JPerceiver over existing methods on all the above three tasks.
arXiv Detail & Related papers (2022-07-16T10:33:59Z) - ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal
Feature Learning [132.20119288212376]
We propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.
To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system.
arXiv Detail & Related papers (2022-07-15T16:57:43Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Learning Accurate and Human-Like Driving using Semantic Maps and
Attention [152.48143666881418]
This paper investigates how end-to-end driving models can be improved to drive more accurately and human-like.
We exploit semantic and visual maps from HERE Technologies and augment the existing Drive360 dataset with such.
Our models are trained and evaluated on the Drive360 + HERE dataset, which features 60 hours and 3000 km of real-world driving data.
arXiv Detail & Related papers (2020-07-10T22:25:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.