Achelous++: Power-Oriented Water-Surface Panoptic Perception Framework
on Edge Devices based on Vision-Radar Fusion and Pruning of Heterogeneous
Modalities
- URL: http://arxiv.org/abs/2312.08851v1
- Date: Thu, 14 Dec 2023 12:10:12 GMT
- Title: Achelous++: Power-Oriented Water-Surface Panoptic Perception Framework
on Edge Devices based on Vision-Radar Fusion and Pruning of Heterogeneous
Modalities
- Authors: Runwei Guan, Haocheng Zhao, Shanliang Yao, Ka Lok Man, Xiaohui Zhu,
Limin Yu, Yong Yue, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue
- Abstract summary: We propose a framework named Achelous++ that facilitates the development and evaluation of multi-task water-surface panoptic perception models.
Achelous++ can simultaneously execute five perception tasks with high speed and low power consumption, including object detection, object semantic segmentation, drivable-area segmentation, waterline segmentation, and radar point cloud semantic segmentation.
Our framework achieves state-of-the-art performance on the WaterScenes benchmark, excelling in both accuracy and power efficiency compared to other single-task and multi-task models.
- Score: 11.793123307886196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Urban water-surface robust perception serves as the foundation for
intelligent monitoring of aquatic environments and the autonomous navigation
and operation of unmanned vessels, especially in the context of waterway
safety. It is worth noting that current multi-sensor fusion and multi-task
learning models consume substantial power and heavily rely on high-power GPUs
for inference. This contributes to increased carbon emissions, a concern that
runs counter to the prevailing emphasis on environmental preservation and the
pursuit of sustainable, low-carbon urban environments. In light of these
concerns, this paper concentrates on low-power, lightweight, multi-task
panoptic perception through the fusion of visual and 4D radar data, which is
seen as a promising low-cost perception method. We propose a framework named
Achelous++ that facilitates the development and comprehensive evaluation of
multi-task water-surface panoptic perception models. Achelous++ can
simultaneously execute five perception tasks with high speed and low power
consumption, including object detection, object semantic segmentation,
drivable-area segmentation, waterline segmentation, and radar point cloud
semantic segmentation. Furthermore, to meet the demand for developers to
customize models for real-time inference on low-performance devices, a novel
multi-modal pruning strategy known as Heterogeneous-Aware SynFlow (HA-SynFlow)
is proposed. Besides, Achelous++ also supports random pruning at initialization
with different layer-wise sparsity, such as Uniform and Erdos-Renyi-Kernel
(ERK). Overall, our Achelous++ framework achieves state-of-the-art performance
on the WaterScenes benchmark, excelling in both accuracy and power efficiency
compared to other single-task and multi-task models. We release and maintain
the code at https://github.com/GuanRunwei/Achelous.
Related papers
- AI-Enhanced Automatic Design of Efficient Underwater Gliders [60.45821679800442]
Building an automated design framework is challenging due to the complexities of representing glider shapes and the high computational costs associated with modeling complex solid-fluid interactions.
We introduce an AI-enhanced automated computational framework designed to overcome these limitations by enabling the creation of underwater robots with non-trivial hull shapes.
Our approach involves an algorithm that co-optimizes both shape and control signals, utilizing a reduced-order geometry representation and a differentiable neural-network-based fluid surrogate model.
arXiv Detail & Related papers (2025-04-30T23:55:44Z) - Learning Underwater Active Perception in Simulation [51.205673783866146]
Turbidity can jeopardise the whole mission as it may prevent correct visual documentation of the inspected structures.
Previous works have introduced methods to adapt to turbidity and backscattering.
We propose a simple yet efficient approach to enable high-quality image acquisition of assets in a broad range of water conditions.
arXiv Detail & Related papers (2025-04-23T06:48:38Z) - WS-DETR: Robust Water Surface Object Detection through Vision-Radar Fusion with Detection Transformer [4.768265044725289]
Water surface object detection faces challenges from blurred edges and diverse object scales.
Existing approaches suffer from cross-modal feature conflicts, which negatively affect model robustness.
We propose a robust vision-radar fusion model WS-DETR, which achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2025-04-10T04:16:46Z) - Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images [0.9883261192383611]
In this paper, we leverage monocular cameras on aerial robots to predict depth and semantic maps in unstructured environments.
We propose a joint deep-learning architecture that can perform the two tasks accurately and rapidly.
arXiv Detail & Related papers (2025-03-23T08:25:07Z) - Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments [57.59857784298534]
We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images.
This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes.
arXiv Detail & Related papers (2025-03-06T05:13:19Z) - Towards an Autonomous Surface Vehicle Prototype for Artificial Intelligence Applications of Water Quality Monitoring [68.41400824104953]
This paper presents a vehicle prototype that addresses the use of Artificial Intelligence algorithms and enhanced sensing techniques for water quality monitoring.
The vehicle is fully equipped with high-quality sensors to measure water quality parameters and water depth.
By means of a stereo-camera, it also can detect and locate macro-plastics in real environments.
arXiv Detail & Related papers (2024-10-08T10:35:32Z) - BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents [56.33989853438012]
We propose BEVWorld, a framework that transforms multimodal sensor inputs into a unified and compact Bird's Eye View latent space for holistic environment modeling.
The proposed world model consists of two main components: a multi-modal tokenizer and a latent BEV sequence diffusion model.
arXiv Detail & Related papers (2024-07-08T07:26:08Z) - ASY-VRNet: Waterway Panoptic Driving Perception Model based on Asymmetric Fair Fusion of Vision and 4D mmWave Radar [7.2865477881451755]
Asymmetric Fair Fusion (AFF) modules designed to efficiently interact with independent features from both visual and radar modalities.
ASY-VRNet model processes image and radar features based on irregular super-pixel point sets.
Compared to other lightweight models, ASY-VRNet achieves state-of-the-art performance in object detection, semantic segmentation, and drivable-area segmentation.
arXiv Detail & Related papers (2023-08-20T14:53:27Z) - Vision-Based Autonomous Navigation for Unmanned Surface Vessel in
Extreme Marine Conditions [2.8983738640808645]
This paper presents an autonomous vision-based navigation framework for tracking target objects in extreme marine conditions.
The proposed framework has been thoroughly tested in simulation under extremely reduced visibility due to sandstorms and fog.
The results are compared with state-of-the-art de-hazing methods across the benchmarked MBZIRC simulation dataset.
arXiv Detail & Related papers (2023-08-08T14:25:13Z) - Achelous: A Fast Unified Water-surface Panoptic Perception Framework
based on Fusion of Monocular Camera and 4D mmWave Radar [7.225125838672763]
Current multi-task perception models are huge in parameters, slow in inference and not scalable.
We propose Achelous, a low-cost and fast unified panoptic perception framework for water-surface perception based on the fusion of a monocular camera and 4D mmWave radar.
Achelous can simultaneously perform five tasks, detection and segmentation of visual targets, drivable-area segmentation, waterline segmentation and radar point cloud segmentation.
arXiv Detail & Related papers (2023-07-14T00:24:30Z) - WaterScenes: A Multi-Task 4D Radar-Camera Fusion Dataset and Benchmarks for Autonomous Driving on Water Surfaces [12.755813310009179]
WaterScenes is the first multi-task 4D radar-camera fusion dataset for autonomous driving on water surfaces.
Our Unmanned Surface Vehicle (USV) proffers all-weather solutions for discerning object-related information.
arXiv Detail & Related papers (2023-07-13T01:05:12Z) - Semantic-aware Texture-Structure Feature Collaboration for Underwater
Image Enhancement [58.075720488942125]
Underwater image enhancement has become an attractive topic as a significant technology in marine engineering and aquatic robotics.
We develop an efficient and compact enhancement network in collaboration with a high-level semantic-aware pretrained model.
We also apply the proposed algorithm to the underwater salient object detection task to reveal the favorable semantic-aware ability for high-level vision tasks.
arXiv Detail & Related papers (2022-11-19T07:50:34Z) - SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain
Adaptation [152.60469768559878]
SHIFT is the largest multi-task synthetic dataset for autonomous driving.
It presents discrete and continuous shifts in cloudiness, rain and fog intensity, time of day, and vehicle and pedestrian density.
Our dataset and benchmark toolkit are publicly available at www.vis.xyz/shift.
arXiv Detail & Related papers (2022-06-16T17:59:52Z) - Towards bio-inspired unsupervised representation learning for indoor
aerial navigation [4.26712082692017]
This research displays a biologically inspired deep-learning algorithm for simultaneous localization and mapping (SLAM) and its application in a drone navigation system.
We propose an unsupervised representation learning method that yields low-dimensional latent state descriptors, that mitigates the sensitivity to perceptual aliasing, and works on power-efficient, embedded hardware.
The designed algorithm is evaluated on a dataset collected in an indoor warehouse environment, and initial results show the feasibility for robust indoor aerial navigation.
arXiv Detail & Related papers (2021-06-17T08:42:38Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - PAN: Towards Fast Action Recognition via Learning Persistence of
Appearance [60.75488333935592]
Most state-of-the-art methods heavily rely on dense optical flow as motion representation.
In this paper, we shed light on fast action recognition by lifting the reliance on optical flow.
We design a novel motion cue called Persistence of Appearance (PA)
In contrast to optical flow, our PA focuses more on distilling the motion information at boundaries.
arXiv Detail & Related papers (2020-08-08T07:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.