Multi-Modal Multi-Task (3MT) Road Segmentation
- URL: http://arxiv.org/abs/2308.11983v1
- Date: Wed, 23 Aug 2023 08:15:15 GMT
- Title: Multi-Modal Multi-Task (3MT) Road Segmentation
- Authors: Erkan Milli, \"Ozg\"ur Erkent, As{\i}m Egemen Y{\i}lmaz
- Abstract summary: We focus on using raw sensor inputs instead of, as it is typically done in many SOTA works, leveraging architectures that require high pre-processing costs.
This study presents a cost-effective and highly accurate solution for road segmentation by integrating data from multiple sensors within a multi-task learning architecture.
- Score: 0.8287206589886879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal systems have the capacity of producing more reliable results than
systems with a single modality in road detection due to perceiving different
aspects of the scene. We focus on using raw sensor inputs instead of, as it is
typically done in many SOTA works, leveraging architectures that require high
pre-processing costs such as surface normals or dense depth predictions. By
using raw sensor inputs, we aim to utilize a low-cost model thatminimizes both
the pre-processing andmodel computation costs. This study presents a
cost-effective and highly accurate solution for road segmentation by
integrating data from multiple sensorswithin a multi-task learning
architecture.Afusion architecture is proposed in which RGB and LiDAR depth
images constitute the inputs of the network. Another contribution of this study
is to use IMU/GNSS (inertial measurement unit/global navigation satellite
system) inertial navigation system whose data is collected synchronously and
calibrated with a LiDAR-camera to compute aggregated dense LiDAR depth images.
It has been demonstrated by experiments on the KITTI dataset that the proposed
method offers fast and high-performance solutions. We have also shown the
performance of our method on Cityscapes where raw LiDAR data is not available.
The segmentation results obtained for both full and half resolution images are
competitive with existing methods. Therefore, we conclude that our method is
not dependent only on raw LiDAR data; rather, it can be used with different
sensor modalities. The inference times obtained in all experiments are very
promising for real-time experiments.
Related papers
- UdeerLID+: Integrating LiDAR, Image, and Relative Depth with Semi-Supervised [12.440461420762265]
Road segmentation is a critical task for autonomous driving systems.
Our work introduces an innovative approach that integrates LiDAR point cloud data, visual image, and relative depth maps.
One of the primary challenges is the scarcity of large-scale, accurately labeled datasets.
arXiv Detail & Related papers (2024-09-10T03:57:30Z) - Robust Depth Enhancement via Polarization Prompt Fusion Tuning [112.88371907047396]
We present a framework that leverages polarization imaging to improve inaccurate depth measurements from various depth sensors.
Our method first adopts a learning-based strategy where a neural network is trained to estimate a dense and complete depth map from polarization data and a sensor depth map from different sensors.
To further improve the performance, we propose a Polarization Prompt Fusion Tuning (PPFT) strategy to effectively utilize RGB-based models pre-trained on large-scale datasets.
arXiv Detail & Related papers (2024-04-05T17:55:33Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - UnLoc: A Universal Localization Method for Autonomous Vehicles using
LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions.
Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z) - RGB-D based Stair Detection using Deep Learning for Autonomous Stair
Climbing [6.362951673024623]
We propose a neural network architecture with inputs of both RGB map and depth map.
Specifically, we design the selective module which can make the network learn the complementary relationship between RGB map and depth map.
Experiments on our dataset show that our method can achieve better accuracy and recall compared with the previous state-of-the-art deep learning method.
arXiv Detail & Related papers (2022-12-02T11:22:52Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust
Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet)
CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement.
Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - CalibDNN: Multimodal Sensor Calibration for Perception Using Deep Neural
Networks [27.877734292570967]
We propose a novel deep learning-driven technique (CalibDNN) for accurate calibration among multimodal sensor, specifically LiDAR-Camera pairs.
The entire processing is fully automatic with a single model and single iteration.
Results comparison among different methods and extensive experiments on different datasets demonstrates the state-of-the-art performance.
arXiv Detail & Related papers (2021-03-27T02:43:37Z) - Depth Completion via Inductive Fusion of Planar LIDAR and Monocular
Camera [27.978780155504467]
We introduce an inductive late-fusion block which better fuses different sensor modalities inspired by a probability model.
This block uses the dense context features to guide the depth prediction based on demonstrations by sparse depth features.
Our method shows promising results compared to previous approaches on both the benchmark datasets and simulated dataset.
arXiv Detail & Related papers (2020-09-03T18:39:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.