Related papers: YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

URL: http://arxiv.org/abs/2208.11434v1
Date: Wed, 24 Aug 2022 11:00:27 GMT
Title: YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception
Authors: Cheng Han, Qichao Zhao, Shuyi Zhang, Yinzi Chen, Zhenlin Zhang, Jinwei Yuan
Abstract summary: Multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems. This paper proposed an effective and efficient multi-task learning network to simultaneously perform the task of traffic object detection, drivable road area segmentation and lane detection. Our model achieved the new state-of-the-art (SOTA) performance in terms of accuracy and speed on the challenging BDD100K dataset.
Score: 1.6683976936678229
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance. It has become a popular paradigm when designing networks for real-time practical autonomous driving system, where computation resources are limited. This paper proposed an effective and efficient multi-task learning network to simultaneously perform the task of traffic object detection, drivable road area segmentation and lane detection. Our model achieved the new state-of-the-art (SOTA) performance in terms of accuracy and speed on the challenging BDD100K dataset. Especially, the inference time is reduced by half compared to the previous SOTA model. Code will be released in the near future.

Related papers

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving [51.47621083057114]
SOLVE is an innovative framework that synergizes Vision-Language Models with end-to-end (E2E) models to enhance autonomous vehicle planning.<n>Our approach emphasizes knowledge sharing at the feature level through a shared visual encoder, enabling comprehensive interaction between VLM and E2E components.
arXiv Detail & Related papers (2025-05-22T15:44:30Z)
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities. We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details. We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z)
YOLO-TS: Real-Time Traffic Sign Detection with Enhanced Accuracy Using Optimized Receptive Fields and Anchor-Free Fusion [15.571409945909243]
We present a novel real-time and efficient road sign detection network, YOLO-TS. This network significantly improves performance by optimizing the receptive fields of multi-scale feature maps. Our innovative feature-fusion strategy, leveraging the flexibility of Anchor-Free methods, achieves remarkable enhancements in both accuracy and speed.
arXiv Detail & Related papers (2024-10-22T16:19:55Z)
MTDT: A Multi-Task Deep Learning Digital Twin [8.600701437207725]
We present a comprehensive traffic flow simulation utilizing a multi-task learning paradigm.<n>Compared to existing deep learning methodologies, the Multi-Task Deep Learning Twin (MTDT) distinguishes itself through its adaptability to local temporal and spatial features.<n>We also show the benefit of multi-task learning in the effectiveness of individual traffic simulation tasks.
arXiv Detail & Related papers (2024-05-02T00:34:10Z)
Penalty-Based Imitation Learning With Cross Semantics Generation Sensor Fusion for Autonomous Driving [1.2749527861829049]
In this paper, we provide a penalty-based imitation learning approach to integrate multiple modalities of information. We observe a remarkable increase in the driving score by more than 12% when compared to the state-of-the-art (SOTA) model, InterFuser. Our model achieves this performance enhancement while achieving a 7-fold increase in inference speed and reducing the model size by approximately 30%.
arXiv Detail & Related papers (2023-03-21T14:29:52Z)
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting. Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories. We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z)
StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception. We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy. Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z)
Learning to Transfer for Traffic Forecasting via Multi-task Learning [3.1836399559127218]
Deep neural networks have demonstrated superior performance in short-term traffic forecasting. Traffic4cast is the first of its kind dedicated to assume the robustness of traffic forecasting models towards domain shifts in space and time. We present a multi-task learning framework for temporal andtemporal domain adaptation of traffic forecasting models.
arXiv Detail & Related papers (2021-11-27T03:16:40Z)
Bayesian Optimization and Deep Learning forsteering wheel angle prediction [58.720142291102135]
This work aims to obtain an accurate model for the prediction of the steering angle in an automated driving system. BO was able to identify, within a limited number of trials, a model -- namely BOST-LSTM -- which resulted, the most accurate when compared to classical end-to-end driving models.
arXiv Detail & Related papers (2021-10-22T15:25:14Z)
A Deep Value-network Based Approach for Multi-Driver Order Dispatching [55.36656442934531]
We propose a deep reinforcement learning based solution for order dispatching. We conduct large scale online A/B tests on DiDi's ride-dispatching platform. Results show that CVNet consistently outperforms other recently proposed dispatching methods.
arXiv Detail & Related papers (2021-06-08T16:27:04Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms [57.21078336887961]
Large ride-hailing platforms, such as DiDi, Uber and Lyft, connect tens of thousands of vehicles in a city to millions of ride demands throughout the day. We propose a unified value-based dynamic learning framework (V1D3) for tackling both tasks.
arXiv Detail & Related papers (2021-05-18T19:22:24Z)
Enhance the performance of navigation: A two-stage machine learning approach [13.674463804942837]
Real time traffic navigation is an important capability in smart transportation technologies. In this paper, we adopt the ideas of ensemble learning and develop a two-stage machine learning model to give accurate navigation results.
arXiv Detail & Related papers (2020-04-02T08:55:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.