Related papers: Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles

Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles

URL: http://arxiv.org/abs/2108.06608v1
Date: Sat, 14 Aug 2021 20:16:08 GMT
Title: Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles
Authors: Simon Bultmann, Jan Quenzel and Sven Behnke
Abstract summary: We propose a UAV system for real-time semantic inference and fusion of multiple sensor modalities. Semantic segmentation of LiDAR scans and RGB images, as well as object detection on RGB and thermal images, run online onboard the UAV computer. We evaluate the integrated system in real-world experiments in an urban environment.
Score: 28.504921333436837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unmanned aerial vehicles (UAVs) equipped with multiple complementary sensors have tremendous potential for fast autonomous or remote-controlled semantic scene analysis, e.g., for disaster examination. In this work, we propose a UAV system for real-time semantic inference and fusion of multiple sensor modalities. Semantic segmentation of LiDAR scans and RGB images, as well as object detection on RGB and thermal images, run online onboard the UAV computer using lightweight CNN architectures and embedded inference accelerators. We follow a late fusion approach where semantic information from multiple modalities augments 3D point clouds and image segmentation masks while also generating an allocentric semantic map. Our system provides augmented semantic images and point clouds with $\approx\,$9$\,$Hz. We evaluate the integrated system in real-world experiments in an urban environment.

Related papers

Deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data [0.08192907805418582]
This paper proposes a late fusion deep learning model (LF-DLM) for semantic segmentation. One branch integrates detailed textures from aerial imagery captured by UNetFormer with a Multi-Axis Vision Transformer (ViT) backbone. The other branch captures complex-temporal dynamics from the Sentinel-2 satellite imageMax time series using a U-ViNet with Temporal Attention (U-TAE)
arXiv Detail & Related papers (2024-10-01T07:50:37Z)
RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network [34.45694077040797]
We present a radar-camera fusion 3D object detection framework called BEEVDet. RadarBEVNet encodes sparse radar points into a dense bird's-eye-view feature. Our method achieves state-of-the-art radar-camera fusion results in 3D object detection, BEV semantic segmentation, and 3D multi-object tracking tasks.
arXiv Detail & Related papers (2024-09-08T05:14:27Z)
Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z)
Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner. Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping. Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z)
Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles with Label Propagation for Cross-Domain Adaptation [28.78192888704324]
We propose a UAV system for real-time semantic inference and fusion of multiple sensor modalities. Semantic segmentation of LiDAR scans and RGB images, as well as object detection on RGB and thermal images, run online onboard the UAV computer. We evaluate the integrated system in real-world experiments in an urban environment and at a disaster test site.
arXiv Detail & Related papers (2022-10-18T10:32:11Z)
Paint and Distill: Boosting 3D Object Detection with Semantic Passing Network [70.53093934205057]
3D object detection task from lidar or camera sensors is essential for autonomous driving. We propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models.
arXiv Detail & Related papers (2022-07-12T12:35:34Z)
3D Semantic Scene Perception using Distributed Smart Edge Sensors [29.998917158604694]
We present a system for 3D semantic scene perception consisting of a network of distributed smart edge sensors. The sensor nodes are based on an embedded CNN inference accelerator and RGB-D and thermal cameras. The proposed perception system provides a complete scene view containing semantically annotated 3D geometry and estimates 3D poses of multiple persons in real time.
arXiv Detail & Related papers (2022-05-03T12:46:26Z)
CFTrack: Center-based Radar and Camera Fusion for 3D Multi-Object Tracking [9.62721286522053]
We propose an end-to-end network for joint object detection and tracking based on radar and camera sensor fusion. Our proposed method uses a center-based radar-camera fusion algorithm for object detection and utilizes a greedy algorithm for object association. We evaluate our method on the challenging nuScenes dataset, where it achieves 20.0 AMOTA and outperforms all vision-based 3D tracking methods in the benchmark.
arXiv Detail & Related papers (2021-07-11T23:56:53Z)
Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention. Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z)
Volumetric Propagation Network: Stereo-LiDAR Fusion for Long-Range Depth Estimation [81.08111209632501]
We propose a geometry-aware stereo-LiDAR fusion network for long-range depth estimation. We exploit sparse and accurate point clouds as a cue for guiding correspondences of stereo images in a unified 3D volume space. Our network achieves state-of-the-art performance on the KITTI and the Virtual- KITTI datasets.
arXiv Detail & Related papers (2021-03-24T03:24:46Z)
Towards Autonomous Driving: a Multi-Modal 360$^{\circ}$ Perception Proposal [87.11988786121447]
This paper presents a framework for 3D object detection and tracking for autonomous vehicles. The solution, based on a novel sensor fusion configuration, provides accurate and reliable road environment detection. A variety of tests of the system, deployed in an autonomous vehicle, have successfully assessed the suitability of the proposed perception stack.
arXiv Detail & Related papers (2020-08-21T20:36:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.