NeurAll: Towards a Unified Visual Perception Model for Automated Driving
- URL: http://arxiv.org/abs/1902.03589v3
- Date: Sat, 9 Mar 2024 23:21:18 GMT
- Title: NeurAll: Towards a Unified Visual Perception Model for Automated Driving
- Authors: Ganesh Sistu, Isabelle Leang, Sumanth Chennupati, Senthil Yogamani,
Ciaran Hughes, Stefan Milz and Samir Rawashdeh
- Abstract summary: We propose a joint multi-task network design for learning several tasks simultaneously.
The main bottleneck in automated driving systems is the limited processing power available on deployment hardware.
- Score: 8.49826472556323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Neural Networks (CNNs) are successfully used for the important
automotive visual perception tasks including object recognition, motion and
depth estimation, visual SLAM, etc. However, these tasks are typically
independently explored and modeled. In this paper, we propose a joint
multi-task network design for learning several tasks simultaneously. Our main
motivation is the computational efficiency achieved by sharing the expensive
initial convolutional layers between all tasks. Indeed, the main bottleneck in
automated driving systems is the limited processing power available on
deployment hardware. There is also some evidence for other benefits in
improving accuracy for some tasks and easing development effort. It also offers
scalability to add more tasks leveraging existing features and achieving better
generalization. We survey various CNN based solutions for visual perception
tasks in automated driving. Then we propose a unified CNN model for the
important tasks and discuss several advanced optimization and architecture
design techniques to improve the baseline model. The paper is partly review and
partly positional with demonstration of several preliminary results promising
for future research. We first demonstrate results of multi-stream learning and
auxiliary learning which are important ingredients to scale to a large
multi-task model. Finally, we implement a two-stream three-task network which
performs better in many cases compared to their corresponding single-task
models, while maintaining network size.
Related papers
- A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images.
We validated our approach on a large hyperspectral dataset called TAIGA.
A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z) - Video Task Decathlon: Unifying Image and Video Tasks in Autonomous
Driving [85.62076860189116]
Video Task Decathlon (VTD) includes ten representative image and video tasks spanning classification, segmentation, localization, and association of objects and pixels.
We develop our unified network, VTDNet, that uses a single structure and a single set of weights for all ten tasks.
arXiv Detail & Related papers (2023-09-08T16:33:27Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Task-Attentive Transformer Architecture for Continual Learning of
Vision-and-Language Tasks Using Knowledge Distillation [18.345183818638475]
Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks.
We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks.
Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead.
arXiv Detail & Related papers (2023-03-25T10:16:53Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - Backbones-Review: Feature Extraction Networks for Deep Learning and Deep
Reinforcement Learning Approaches [3.255610188565679]
CNNs allow to work on large-scale size of data, as well as cover different scenarios for a specific task.
Many networks have been proposed and become the famous networks used for any DL models in any AI task.
A backbone is a known network trained in many other tasks before and demonstrates its effectiveness.
arXiv Detail & Related papers (2022-06-16T09:18:34Z) - MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks.
Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z) - High Efficiency Pedestrian Crossing Prediction [0.0]
State-of-the-art methods in predicting pedestrian crossing intention often rely on multiple streams of information as inputs.
We introduce a network with only frames of pedestrians as the input.
Experiments validate that our model consistently delivers outstanding performances.
arXiv Detail & Related papers (2022-04-04T21:37:57Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - A Unified Object Motion and Affinity Model for Online Multi-Object
Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA.
UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning.
We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z) - Deep Multi-Task Augmented Feature Learning via Hierarchical Graph Neural
Network [4.121467410954028]
We propose a Hierarchical Graph Neural Network to learn augmented features for deep multi-task learning.
Experiments on real-world datastes show the significant performance improvement when using this strategy.
arXiv Detail & Related papers (2020-02-12T06:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.