Achelous: A Fast Unified Water-surface Panoptic Perception Framework
based on Fusion of Monocular Camera and 4D mmWave Radar
- URL: http://arxiv.org/abs/2307.07102v1
- Date: Fri, 14 Jul 2023 00:24:30 GMT
- Title: Achelous: A Fast Unified Water-surface Panoptic Perception Framework
based on Fusion of Monocular Camera and 4D mmWave Radar
- Authors: Runwei Guan, Shanliang Yao, Xiaohui Zhu, Ka Lok Man, Eng Gee Lim,
Jeremy Smith, Yong Yue, Yutao Yue
- Abstract summary: Current multi-task perception models are huge in parameters, slow in inference and not scalable.
We propose Achelous, a low-cost and fast unified panoptic perception framework for water-surface perception based on the fusion of a monocular camera and 4D mmWave radar.
Achelous can simultaneously perform five tasks, detection and segmentation of visual targets, drivable-area segmentation, waterline segmentation and radar point cloud segmentation.
- Score: 7.225125838672763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current perception models for different tasks usually exist in modular forms
on Unmanned Surface Vehicles (USVs), which infer extremely slowly in parallel
on edge devices, causing the asynchrony between perception results and USV
position, and leading to error decisions of autonomous navigation. Compared
with Unmanned Ground Vehicles (UGVs), the robust perception of USVs develops
relatively slowly. Moreover, most current multi-task perception models are huge
in parameters, slow in inference and not scalable. Oriented on this, we propose
Achelous, a low-cost and fast unified panoptic perception framework for
water-surface perception based on the fusion of a monocular camera and 4D
mmWave radar. Achelous can simultaneously perform five tasks, detection and
segmentation of visual targets, drivable-area segmentation, waterline
segmentation and radar point cloud segmentation. Besides, models in Achelous
family, with less than around 5 million parameters, achieve about 18 FPS on an
NVIDIA Jetson AGX Xavier, 11 FPS faster than HybridNets, and exceed YOLOX-Tiny
and Segformer-B0 on our collected dataset about 5 mAP$_{\text{50-95}}$ and 0.7
mIoU, especially under situations of adverse weather, dark environments and
camera failure. To our knowledge, Achelous is the first comprehensive panoptic
perception framework combining vision-level and point-cloud-level tasks for
water-surface perception. To promote the development of the intelligent
transportation community, we release our codes in
\url{https://github.com/GuanRunwei/Achelous}.
Related papers
- Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar [62.51065633674272]
We introduce Radar Fields - a neural scene reconstruction method designed for active radar imagers.
Our approach unites an explicit, physics-informed sensor model with an implicit neural geometry and reflectance model to directly synthesize raw radar measurements.
We validate the effectiveness of the method across diverse outdoor scenarios, including urban scenes with dense vehicles and infrastructure.
arXiv Detail & Related papers (2024-05-07T20:44:48Z) - Achelous++: Power-Oriented Water-Surface Panoptic Perception Framework
on Edge Devices based on Vision-Radar Fusion and Pruning of Heterogeneous
Modalities [11.793123307886196]
We propose a framework named Achelous++ that facilitates the development and evaluation of multi-task water-surface panoptic perception models.
Achelous++ can simultaneously execute five perception tasks with high speed and low power consumption, including object detection, object semantic segmentation, drivable-area segmentation, waterline segmentation, and radar point cloud semantic segmentation.
Our framework achieves state-of-the-art performance on the WaterScenes benchmark, excelling in both accuracy and power efficiency compared to other single-task and multi-task models.
arXiv Detail & Related papers (2023-12-14T12:10:12Z) - ASY-VRNet: Waterway Panoptic Driving Perception Model based on Asymmetric Fair Fusion of Vision and 4D mmWave Radar [7.2865477881451755]
Asymmetric Fair Fusion (AFF) modules designed to efficiently interact with independent features from both visual and radar modalities.
ASY-VRNet model processes image and radar features based on irregular super-pixel point sets.
Compared to other lightweight models, ASY-VRNet achieves state-of-the-art performance in object detection, semantic segmentation, and drivable-area segmentation.
arXiv Detail & Related papers (2023-08-20T14:53:27Z) - Echoes Beyond Points: Unleashing the Power of Raw Radar Data in
Multi-modality Fusion [74.84019379368807]
We propose a novel method named EchoFusion to skip the existing radar signal processing pipeline.
Specifically, we first generate the Bird's Eye View (BEV) queries and then take corresponding spectrum features from radar to fuse with other sensors.
arXiv Detail & Related papers (2023-07-31T09:53:50Z) - WaterScenes: A Multi-Task 4D Radar-Camera Fusion Dataset and Benchmarks for Autonomous Driving on Water Surfaces [12.755813310009179]
WaterScenes is the first multi-task 4D radar-camera fusion dataset for autonomous driving on water surfaces.
Our Unmanned Surface Vehicle (USV) proffers all-weather solutions for discerning object-related information.
arXiv Detail & Related papers (2023-07-13T01:05:12Z) - Semantic Segmentation of Radar Detections using Convolutions on Point
Clouds [59.45414406974091]
We introduce a deep-learning based method to convolve radar detections into point clouds.
We adapt this algorithm to radar-specific properties through distance-dependent clustering and pre-processing of input point clouds.
Our network outperforms state-of-the-art approaches that are based on PointNet++ on the task of semantic segmentation of radar point clouds.
arXiv Detail & Related papers (2023-05-22T07:09:35Z) - EV-Catcher: High-Speed Object Catching Using Low-latency Event-based
Neural Networks [107.62975594230687]
We demonstrate an application where event cameras excel: accurately estimating the impact location of fast-moving objects.
We introduce a lightweight event representation called Binary Event History Image (BEHI) to encode event data at low latency.
We show that the system is capable of achieving a success rate of 81% in catching balls targeted at different locations, with a velocity of up to 13 m/s even on compute-constrained embedded platforms.
arXiv Detail & Related papers (2023-04-14T15:23:28Z) - Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data [33.457104508061015]
Scene understanding is crucial for autonomous robots in dynamic environments for making future state predictions, avoiding collisions, and path planning.
Camera and LiDAR perception made tremendous progress in recent years, but face limitations under adverse weather conditions.
To leverage the full potential of multi-modal sensor suites, radar sensors are essential for safety critical tasks and are already installed in most new vehicles today.
arXiv Detail & Related papers (2022-12-07T15:05:03Z) - Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes.
We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z) - BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [105.96557764248846]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework.
It unifies multi-modal features in the shared bird's-eye view representation space.
It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z) - Cross-modal Learning of Graph Representations using Radar Point Cloud
for Long-Range Gesture Recognition [6.9545038359818445]
We propose a novel architecture for a long-range (1m - 2m) gesture recognition solution.
We use a point cloud-based cross-learning approach from camera point cloud to 60-GHz FMCW radar point cloud.
In the experimental results section, we demonstrate our model's overall accuracy of 98.4% for five gestures and its generalization capability.
arXiv Detail & Related papers (2022-03-31T14:34:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.