Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles with
Label Propagation for Cross-Domain Adaptation
- URL: http://arxiv.org/abs/2210.09739v1
- Date: Tue, 18 Oct 2022 10:32:11 GMT
- Title: Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles with
Label Propagation for Cross-Domain Adaptation
- Authors: Simon Bultmann, Jan Quenzel, Sven Behnke
- Abstract summary: We propose a UAV system for real-time semantic inference and fusion of multiple sensor modalities.
Semantic segmentation of LiDAR scans and RGB images, as well as object detection on RGB and thermal images, run online onboard the UAV computer.
We evaluate the integrated system in real-world experiments in an urban environment and at a disaster test site.
- Score: 28.78192888704324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unmanned aerial vehicles (UAVs) equipped with multiple complementary sensors
have tremendous potential for fast autonomous or remote-controlled semantic
scene analysis, e.g., for disaster examination. Here, we propose a UAV system
for real-time semantic inference and fusion of multiple sensor modalities.
Semantic segmentation of LiDAR scans and RGB images, as well as object
detection on RGB and thermal images, run online onboard the UAV computer using
lightweight CNN architectures and embedded inference accelerators. We follow a
late fusion approach where semantic information from multiple sensor modalities
augments 3D point clouds and image segmentation masks while also generating an
allocentric semantic map. Label propagation on the semantic map allows for
sensor-specific adaptation with cross-modality and cross-domain supervision.
Our system provides augmented semantic images and point clouds with $\approx$ 9
Hz. We evaluate the integrated system in real-world experiments in an urban
environment and at a disaster test site.
Related papers
- Deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data [0.08192907805418582]
This paper proposes a late fusion deep learning model (LF-DLM) for semantic segmentation.
One branch integrates detailed textures from aerial imagery captured by UNetFormer with a Multi-Axis Vision Transformer (ViT) backbone.
The other branch captures complex-temporal dynamics from the Sentinel-2 satellite imageMax time series using a U-ViNet with Temporal Attention (U-TAE)
arXiv Detail & Related papers (2024-10-01T07:50:37Z) - Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System [0.0]
We propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems.
Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance.
Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
arXiv Detail & Related papers (2024-04-25T12:04:31Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - RMMDet: Road-Side Multitype and Multigroup Sensor Detection System for
Autonomous Driving [3.8917150802484994]
RMMDet is a road-side multitype and multigroup sensor detection system for autonomous driving.
We use a ROS-based virtual environment to simulate real-world conditions.
We produce local datasets and real sand table field, and conduct various experiments.
arXiv Detail & Related papers (2023-03-09T12:13:39Z) - Paint and Distill: Boosting 3D Object Detection with Semantic Passing
Network [70.53093934205057]
3D object detection task from lidar or camera sensors is essential for autonomous driving.
We propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models.
arXiv Detail & Related papers (2022-07-12T12:35:34Z) - 3D Semantic Scene Perception using Distributed Smart Edge Sensors [29.998917158604694]
We present a system for 3D semantic scene perception consisting of a network of distributed smart edge sensors.
The sensor nodes are based on an embedded CNN inference accelerator and RGB-D and thermal cameras.
The proposed perception system provides a complete scene view containing semantically annotated 3D geometry and estimates 3D poses of multiple persons in real time.
arXiv Detail & Related papers (2022-05-03T12:46:26Z) - Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles [28.504921333436837]
We propose a UAV system for real-time semantic inference and fusion of multiple sensor modalities.
Semantic segmentation of LiDAR scans and RGB images, as well as object detection on RGB and thermal images, run online onboard the UAV computer.
We evaluate the integrated system in real-world experiments in an urban environment.
arXiv Detail & Related papers (2021-08-14T20:16:08Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z) - Towards Autonomous Driving: a Multi-Modal 360$^{\circ}$ Perception
Proposal [87.11988786121447]
This paper presents a framework for 3D object detection and tracking for autonomous vehicles.
The solution, based on a novel sensor fusion configuration, provides accurate and reliable road environment detection.
A variety of tests of the system, deployed in an autonomous vehicle, have successfully assessed the suitability of the proposed perception stack.
arXiv Detail & Related papers (2020-08-21T20:36:21Z) - siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera
3D Object Detection [65.03384167873564]
A siamese network is integrated into the pipeline of a well-known 3D object detector approach.
associations are exploited to enhance the 3D box regression of the object.
The experimental evaluation on the nuScenes dataset shows that the proposed method outperforms traditional NMS approaches.
arXiv Detail & Related papers (2020-02-19T15:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.