Multitask Network for Joint Object Detection, Semantic Segmentation and
Human Pose Estimation in Vehicle Occupancy Monitoring
- URL: http://arxiv.org/abs/2205.01515v1
- Date: Tue, 3 May 2022 14:11:18 GMT
- Title: Multitask Network for Joint Object Detection, Semantic Segmentation and
Human Pose Estimation in Vehicle Occupancy Monitoring
- Authors: Nikolas Ebert, Patrick Mangat, Oliver Wasenm\"uller
- Abstract summary: Multitask Detection, neural Pose and Estimation Network (DSPM)
We propose our Multitask Detection, neural Pose and Estimation Network (DSPM)
Our architecture allows a flexible combination of the three mentioned tasks during a simple end-to-end training.
We perform comprehensive evaluations on the public datasets SVIRO and TiCaM in order to demonstrate the superior performance.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to ensure safe autonomous driving, precise information about the
conditions in and around the vehicle must be available. Accordingly, the
monitoring of occupants and objects inside the vehicle is crucial. In the
state-of-the-art, single or multiple deep neural networks are used for either
object recognition, semantic segmentation, or human pose estimation. In
contrast, we propose our Multitask Detection, Segmentation and Pose Estimation
Network (MDSP) -- the first multitask network solving all these three tasks
jointly in the area of occupancy monitoring. Due to the shared architecture,
memory and computing costs can be saved while achieving higher accuracy.
Furthermore, our architecture allows a flexible combination of the three
mentioned tasks during a simple end-to-end training. We perform comprehensive
evaluations on the public datasets SVIRO and TiCaM in order to demonstrate the
superior performance.
Related papers
- A Point-Based Approach to Efficient LiDAR Multi-Task Perception [49.91741677556553]
PAttFormer is an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds.
Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for task-specific point cloud representations.
Our evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP.
arXiv Detail & Related papers (2024-04-19T11:24:34Z) - Multi-task Learning for Real-time Autonomous Driving Leveraging
Task-adaptive Attention Generator [15.94714567272497]
We present a new real-time multi-task network adept at three vital autonomous driving tasks: monocular 3D object detection, semantic segmentation, and dense depth estimation.
To counter the challenge of negative transfer, which is the prevalent issue in multi-task learning, we introduce a task-adaptive attention generator.
Our rigorously optimized network, when tested on the Cityscapes-3D datasets, consistently outperforms various baseline models.
arXiv Detail & Related papers (2024-03-06T05:04:40Z) - Simultaneous Clutter Detection and Semantic Segmentation of Moving
Objects for Automotive Radar Data [12.96486891333286]
Radar sensors are an important part of the environment perception system of autonomous vehicles.
One of the first steps during the processing of radar point clouds is often the detection of clutter.
Another common objective is the semantic segmentation of moving road users.
We show that our setup is highly effective and outperforms every existing network for semantic segmentation on the RadarScenes dataset.
arXiv Detail & Related papers (2023-11-13T11:29:38Z) - LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception
Network for Autonomous Driving [7.137567622606353]
We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation.
We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively.
We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection.
arXiv Detail & Related papers (2023-07-17T21:22:17Z) - A Dynamic Feature Interaction Framework for Multi-task Visual Perception [100.98434079696268]
We devise an efficient unified framework to solve multiple common perception tasks.
These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation.
Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
arXiv Detail & Related papers (2023-06-08T09:24:46Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles.
An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach.
We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z) - A Simple and Efficient Multi-task Network for 3D Object Detection and
Road Understanding [20.878931360708343]
We show that it is possible to perform all perception tasks via a simple and efficient multi-task network.
Our proposed network, LidarMTL, takes raw LiDAR point cloud as inputs, and predicts six perception outputs for 3D object detection and road understanding.
arXiv Detail & Related papers (2021-03-06T08:00:26Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.