Achelous++: Power-Oriented Water-Surface Panoptic Perception Framework
on Edge Devices based on Vision-Radar Fusion and Pruning of Heterogeneous
Modalities
- URL: http://arxiv.org/abs/2312.08851v1
- Date: Thu, 14 Dec 2023 12:10:12 GMT
- Title: Achelous++: Power-Oriented Water-Surface Panoptic Perception Framework
on Edge Devices based on Vision-Radar Fusion and Pruning of Heterogeneous
Modalities
- Authors: Runwei Guan, Haocheng Zhao, Shanliang Yao, Ka Lok Man, Xiaohui Zhu,
Limin Yu, Yong Yue, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue
- Abstract summary: We propose a framework named Achelous++ that facilitates the development and evaluation of multi-task water-surface panoptic perception models.
Achelous++ can simultaneously execute five perception tasks with high speed and low power consumption, including object detection, object semantic segmentation, drivable-area segmentation, waterline segmentation, and radar point cloud semantic segmentation.
Our framework achieves state-of-the-art performance on the WaterScenes benchmark, excelling in both accuracy and power efficiency compared to other single-task and multi-task models.
- Score: 11.793123307886196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Urban water-surface robust perception serves as the foundation for
intelligent monitoring of aquatic environments and the autonomous navigation
and operation of unmanned vessels, especially in the context of waterway
safety. It is worth noting that current multi-sensor fusion and multi-task
learning models consume substantial power and heavily rely on high-power GPUs
for inference. This contributes to increased carbon emissions, a concern that
runs counter to the prevailing emphasis on environmental preservation and the
pursuit of sustainable, low-carbon urban environments. In light of these
concerns, this paper concentrates on low-power, lightweight, multi-task
panoptic perception through the fusion of visual and 4D radar data, which is
seen as a promising low-cost perception method. We propose a framework named
Achelous++ that facilitates the development and comprehensive evaluation of
multi-task water-surface panoptic perception models. Achelous++ can
simultaneously execute five perception tasks with high speed and low power
consumption, including object detection, object semantic segmentation,
drivable-area segmentation, waterline segmentation, and radar point cloud
semantic segmentation. Furthermore, to meet the demand for developers to
customize models for real-time inference on low-performance devices, a novel
multi-modal pruning strategy known as Heterogeneous-Aware SynFlow (HA-SynFlow)
is proposed. Besides, Achelous++ also supports random pruning at initialization
with different layer-wise sparsity, such as Uniform and Erdos-Renyi-Kernel
(ERK). Overall, our Achelous++ framework achieves state-of-the-art performance
on the WaterScenes benchmark, excelling in both accuracy and power efficiency
compared to other single-task and multi-task models. We release and maintain
the code at https://github.com/GuanRunwei/Achelous.
Related papers
- Towards an Autonomous Surface Vehicle Prototype for Artificial Intelligence Applications of Water Quality Monitoring [68.41400824104953]
This paper presents a vehicle prototype that addresses the use of Artificial Intelligence algorithms and enhanced sensing techniques for water quality monitoring.
The vehicle is fully equipped with high-quality sensors to measure water quality parameters and water depth.
By means of a stereo-camera, it also can detect and locate macro-plastics in real environments.
arXiv Detail & Related papers (2024-10-08T10:35:32Z) - ASY-VRNet: Waterway Panoptic Driving Perception Model based on Asymmetric Fair Fusion of Vision and 4D mmWave Radar [7.2865477881451755]
Asymmetric Fair Fusion (AFF) modules designed to efficiently interact with independent features from both visual and radar modalities.
ASY-VRNet model processes image and radar features based on irregular super-pixel point sets.
Compared to other lightweight models, ASY-VRNet achieves state-of-the-art performance in object detection, semantic segmentation, and drivable-area segmentation.
arXiv Detail & Related papers (2023-08-20T14:53:27Z) - Vision-Based Autonomous Navigation for Unmanned Surface Vessel in
Extreme Marine Conditions [2.8983738640808645]
This paper presents an autonomous vision-based navigation framework for tracking target objects in extreme marine conditions.
The proposed framework has been thoroughly tested in simulation under extremely reduced visibility due to sandstorms and fog.
The results are compared with state-of-the-art de-hazing methods across the benchmarked MBZIRC simulation dataset.
arXiv Detail & Related papers (2023-08-08T14:25:13Z) - Achelous: A Fast Unified Water-surface Panoptic Perception Framework
based on Fusion of Monocular Camera and 4D mmWave Radar [7.225125838672763]
Current multi-task perception models are huge in parameters, slow in inference and not scalable.
We propose Achelous, a low-cost and fast unified panoptic perception framework for water-surface perception based on the fusion of a monocular camera and 4D mmWave radar.
Achelous can simultaneously perform five tasks, detection and segmentation of visual targets, drivable-area segmentation, waterline segmentation and radar point cloud segmentation.
arXiv Detail & Related papers (2023-07-14T00:24:30Z) - WaterScenes: A Multi-Task 4D Radar-Camera Fusion Dataset and Benchmarks for Autonomous Driving on Water Surfaces [12.755813310009179]
WaterScenes is the first multi-task 4D radar-camera fusion dataset for autonomous driving on water surfaces.
Our Unmanned Surface Vehicle (USV) proffers all-weather solutions for discerning object-related information.
arXiv Detail & Related papers (2023-07-13T01:05:12Z) - Semantic-aware Texture-Structure Feature Collaboration for Underwater
Image Enhancement [58.075720488942125]
Underwater image enhancement has become an attractive topic as a significant technology in marine engineering and aquatic robotics.
We develop an efficient and compact enhancement network in collaboration with a high-level semantic-aware pretrained model.
We also apply the proposed algorithm to the underwater salient object detection task to reveal the favorable semantic-aware ability for high-level vision tasks.
arXiv Detail & Related papers (2022-11-19T07:50:34Z) - SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain
Adaptation [152.60469768559878]
SHIFT is the largest multi-task synthetic dataset for autonomous driving.
It presents discrete and continuous shifts in cloudiness, rain and fog intensity, time of day, and vehicle and pedestrian density.
Our dataset and benchmark toolkit are publicly available at www.vis.xyz/shift.
arXiv Detail & Related papers (2022-06-16T17:59:52Z) - Towards bio-inspired unsupervised representation learning for indoor
aerial navigation [4.26712082692017]
This research displays a biologically inspired deep-learning algorithm for simultaneous localization and mapping (SLAM) and its application in a drone navigation system.
We propose an unsupervised representation learning method that yields low-dimensional latent state descriptors, that mitigates the sensitivity to perceptual aliasing, and works on power-efficient, embedded hardware.
The designed algorithm is evaluated on a dataset collected in an indoor warehouse environment, and initial results show the feasibility for robust indoor aerial navigation.
arXiv Detail & Related papers (2021-06-17T08:42:38Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - PAN: Towards Fast Action Recognition via Learning Persistence of
Appearance [60.75488333935592]
Most state-of-the-art methods heavily rely on dense optical flow as motion representation.
In this paper, we shed light on fast action recognition by lifting the reliance on optical flow.
We design a novel motion cue called Persistence of Appearance (PA)
In contrast to optical flow, our PA focuses more on distilling the motion information at boundaries.
arXiv Detail & Related papers (2020-08-08T07:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.