Related papers: Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar

Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar

URL: http://arxiv.org/abs/2307.07102v1
Date: Fri, 14 Jul 2023 00:24:30 GMT
Title: Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar
Authors: Runwei Guan, Shanliang Yao, Xiaohui Zhu, Ka Lok Man, Eng Gee Lim, Jeremy Smith, Yong Yue, Yutao Yue
Abstract summary: Current multi-task perception models are huge in parameters, slow in inference and not scalable. We propose Achelous, a low-cost and fast unified panoptic perception framework for water-surface perception based on the fusion of a monocular camera and 4D mmWave radar. Achelous can simultaneously perform five tasks, detection and segmentation of visual targets, drivable-area segmentation, waterline segmentation and radar point cloud segmentation.
Score: 7.225125838672763
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current perception models for different tasks usually exist in modular forms on Unmanned Surface Vehicles (USVs), which infer extremely slowly in parallel on edge devices, causing the asynchrony between perception results and USV position, and leading to error decisions of autonomous navigation. Compared with Unmanned Ground Vehicles (UGVs), the robust perception of USVs develops relatively slowly. Moreover, most current multi-task perception models are huge in parameters, slow in inference and not scalable. Oriented on this, we propose Achelous, a low-cost and fast unified panoptic perception framework for water-surface perception based on the fusion of a monocular camera and 4D mmWave radar. Achelous can simultaneously perform five tasks, detection and segmentation of visual targets, drivable-area segmentation, waterline segmentation and radar point cloud segmentation. Besides, models in Achelous family, with less than around 5 million parameters, achieve about 18 FPS on an NVIDIA Jetson AGX Xavier, 11 FPS faster than HybridNets, and exceed YOLOX-Tiny and Segformer-B0 on our collected dataset about 5 mAP$_{\text{50-95}}$ and 0.7 mIoU, especially under situations of adverse weather, dark environments and camera failure. To our knowledge, Achelous is the first comprehensive panoptic perception framework combining vision-level and point-cloud-level tasks for water-surface perception. To promote the development of the intelligent transportation community, we release our codes in \url{https://github.com/GuanRunwei/Achelous}.

Related papers

NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z)
TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion [54.46664104437454]
We propose TacoDepth, an efficient and accurate Radar-Camera depth estimation model with one-stage fusion. Specifically, the graph-based Radar structure extractor and the pyramid-based Radar fusion module are designed. Compared with the previous state-of-the-art approach, TacoDepth improves depth accuracy and processing speed by 12.8% and 91.8%.
arXiv Detail & Related papers (2025-04-16T05:25:04Z)
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance. Radar suffers from noise and positional ambiguity. We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z)
Doracamom: Joint 3D Detection and Occupancy Prediction with Multi-view 4D Radars and Cameras for Omnidirectional Perception [9.76463525667238]
We propose Doracamom, the first framework that fuses multi-view cameras and 4D radar for joint 3D object detection and semantic occupancy prediction. Code and models will be publicly available.
arXiv Detail & Related papers (2025-01-26T04:24:07Z)
MetaOcc: Spatio-Temporal Fusion of Surround-View 4D Radar and Camera for 3D Occupancy Prediction with Dual Training Strategies [12.485905108032146]
This paper introduces MetaOcc, a novel multi-modal framework for omni-oriented 3D occupancy prediction.<n>To address the limitations of directly applying encoders to sparse radar data, we propose a Radar Height Self-Attention module.<n>To reduce reliance on expensive point cloud, we propose a pseudo-label generation pipeline based on an open-set segmentor.
arXiv Detail & Related papers (2025-01-26T03:51:56Z)
Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar [62.51065633674272]
We introduce Radar Fields - a neural scene reconstruction method designed for active radar imagers. Our approach unites an explicit, physics-informed sensor model with an implicit neural geometry and reflectance model to directly synthesize raw radar measurements. We validate the effectiveness of the method across diverse outdoor scenarios, including urban scenes with dense vehicles and infrastructure.
arXiv Detail & Related papers (2024-05-07T20:44:48Z)
Achelous++: Power-Oriented Water-Surface Panoptic Perception Framework on Edge Devices based on Vision-Radar Fusion and Pruning of Heterogeneous Modalities [11.793123307886196]
We propose a framework named Achelous++ that facilitates the development and evaluation of multi-task water-surface panoptic perception models. Achelous++ can simultaneously execute five perception tasks with high speed and low power consumption, including object detection, object semantic segmentation, drivable-area segmentation, waterline segmentation, and radar point cloud semantic segmentation. Our framework achieves state-of-the-art performance on the WaterScenes benchmark, excelling in both accuracy and power efficiency compared to other single-task and multi-task models.
arXiv Detail & Related papers (2023-12-14T12:10:12Z)
ASY-VRNet: Waterway Panoptic Driving Perception Model based on Asymmetric Fair Fusion of Vision and 4D mmWave Radar [7.2865477881451755]
Asymmetric Fair Fusion (AFF) modules designed to efficiently interact with independent features from both visual and radar modalities. ASY-VRNet model processes image and radar features based on irregular super-pixel point sets. Compared to other lightweight models, ASY-VRNet achieves state-of-the-art performance in object detection, semantic segmentation, and drivable-area segmentation.
arXiv Detail & Related papers (2023-08-20T14:53:27Z)
Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion [74.84019379368807]
We propose a novel method named EchoFusion to skip the existing radar signal processing pipeline. Specifically, we first generate the Bird's Eye View (BEV) queries and then take corresponding spectrum features from radar to fuse with other sensors.
arXiv Detail & Related papers (2023-07-31T09:53:50Z)
WaterScenes: A Multi-Task 4D Radar-Camera Fusion Dataset and Benchmarks for Autonomous Driving on Water Surfaces [12.755813310009179]
WaterScenes is the first multi-task 4D radar-camera fusion dataset for autonomous driving on water surfaces. Our Unmanned Surface Vehicle (USV) proffers all-weather solutions for discerning object-related information.
arXiv Detail & Related papers (2023-07-13T01:05:12Z)
Semantic Segmentation of Radar Detections using Convolutions on Point Clouds [59.45414406974091]
We introduce a deep-learning based method to convolve radar detections into point clouds. We adapt this algorithm to radar-specific properties through distance-dependent clustering and pre-processing of input point clouds. Our network outperforms state-of-the-art approaches that are based on PointNet++ on the task of semantic segmentation of radar point clouds.
arXiv Detail & Related papers (2023-05-22T07:09:35Z)
EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural Networks [107.62975594230687]
We demonstrate an application where event cameras excel: accurately estimating the impact location of fast-moving objects. We introduce a lightweight event representation called Binary Event History Image (BEHI) to encode event data at low latency. We show that the system is capable of achieving a success rate of 81% in catching balls targeted at different locations, with a velocity of up to 13 m/s even on compute-constrained embedded platforms.
arXiv Detail & Related papers (2023-04-14T15:23:28Z)
Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data [33.457104508061015]
Scene understanding is crucial for autonomous robots in dynamic environments for making future state predictions, avoiding collisions, and path planning. Camera and LiDAR perception made tremendous progress in recent years, but face limitations under adverse weather conditions. To leverage the full potential of multi-modal sensor suites, radar sensors are essential for safety critical tasks and are already installed in most new vehicles today.
arXiv Detail & Related papers (2022-12-07T15:05:03Z)
Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes. We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z)
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [105.96557764248846]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view representation space. It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z)
Cross-modal Learning of Graph Representations using Radar Point Cloud for Long-Range Gesture Recognition [6.9545038359818445]
We propose a novel architecture for a long-range (1m - 2m) gesture recognition solution. We use a point cloud-based cross-learning approach from camera point cloud to 60-GHz FMCW radar point cloud. In the experimental results section, we demonstrate our model's overall accuracy of 98.4% for five gestures and its generalization capability.
arXiv Detail & Related papers (2022-03-31T14:34:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.