Related papers: You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection

You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection

URL: http://arxiv.org/abs/2504.15694v1
Date: Tue, 22 Apr 2025 08:26:35 GMT
Title: You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection
Authors: Jun Dong, Wenli Wu, Jintao Cheng, Xiaoyu Tang,
Abstract summary: We propose an Ultra-Light Real-Time Underwater Object Detection framework, You Sense Only Once Beneath (YSOOB)<n>Specifically, we utilize a Multi-Spectrum Wavelet (MSWE) to perform frequency-domain encoding on the input image, minimizing the semantic loss caused by underwater optical color distortion.<n>We also eliminate model redundancy through a simple yet effective channel compression and reconstructed large kernel convolution (RLKC) to achieve model lightweight.
Score: 2.5249064981269296
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the remarkable achievements in object detection, the model's accuracy and efficiency still require further improvement under challenging underwater conditions, such as low image quality and limited computational resources. To address this, we propose an Ultra-Light Real-Time Underwater Object Detection framework, You Sense Only Once Beneath (YSOOB). Specifically, we utilize a Multi-Spectrum Wavelet Encoder (MSWE) to perform frequency-domain encoding on the input image, minimizing the semantic loss caused by underwater optical color distortion. Furthermore, we revisit the unique characteristics of even-sized and transposed convolutions, allowing the model to dynamically select and enhance key information during the resampling process, thereby improving its generalization ability. Finally, we eliminate model redundancy through a simple yet effective channel compression and reconstructed large kernel convolution (RLKC) to achieve model lightweight. As a result, forms a high-performance underwater object detector YSOOB with only 1.2 million parameters. Extensive experimental results demonstrate that, with the fewest parameters, YSOOB achieves mAP50 of 83.1% and 82.9% on the URPC2020 and DUO datasets, respectively, comparable to the current SOTA detectors. The inference speed reaches 781.3 FPS and 57.8 FPS on the T4 GPU (TensorRT FP16) and the edge computing device Jetson Xavier NX (TensorRT FP16), surpassing YOLOv12-N by 28.1% and 22.5%, respectively.

Related papers

Improve Underwater Object Detection through YOLOv12 Architecture and Physics-informed Augmentation [0.20767168898581637]
Underwater object detection is crucial for autonomous navigation, environmental monitoring, and marine exploration.<n>Current methods balance accuracy and computational efficiency, but they have trouble deploying in real-time under low visibility conditions.<n>This study advances underwater detection through the integration of physics-informed augmentation techniques with the YOLOv12 architecture.
arXiv Detail & Related papers (2025-06-30T04:06:50Z)
Sebica: Lightweight Spatial and Efficient Bidirectional Channel Attention Super Resolution Network [0.0]
Single Image Super-Resolution (SISR) is a vital technique for improving the visual quality of low-resolution images. We present Sebica, a lightweight network that incorporates spatial and efficient bidirectional channel attention mechanisms. Sebica significantly reduces computational costs while maintaining high reconstruction quality.
arXiv Detail & Related papers (2024-10-27T18:27:07Z)
Low-power Ship Detection in Satellite Images Using Neuromorphic Hardware [1.4330085996657045]
On-board data processing can identify ships and reduce the amount of data sent to the ground. Most images captured on board contain only bodies of water or land, with the Airbus Ship Detection dataset showing only 22.1% of images containing ships. We designed a low-power, two-stage system to optimize performance instead of relying on a single complex model.
arXiv Detail & Related papers (2024-06-17T08:36:12Z)
Tiny-VBF: Resource-Efficient Vision Transformer based Lightweight Beamformer for Ultrasound Single-Angle Plane Wave Imaging [4.15681035147785]
In this work, we propose a novel vision transformer based tiny beamformer (Tiny-VBF) The output of our Tiny-VBF provides fast envelope detection requiring very low frame rate. We propose an accelerator architecture and implement our Tiny-VBF model on a Zynq UltraScale+ MPSoC ZCU104 FPGA.
arXiv Detail & Related papers (2023-11-20T10:47:52Z)
Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure. First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module. The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z)
Learning Heavily-Degraded Prior for Underwater Object Detection [59.5084433933765]
This paper seeks transferable prior knowledge from detector-friendly images. It is based on statistical observations that, the heavily degraded regions of detector-friendly (DFUI) and underwater images have evident feature distribution gaps. Our method with higher speeds and less parameters still performs better than transformer-based detectors.
arXiv Detail & Related papers (2023-08-24T12:32:46Z)
Ultra-low Power Deep Learning-based Monocular Relative Localization Onboard Nano-quadrotors [64.68349896377629]
This work presents a novel autonomous end-to-end system that addresses the monocular relative localization, through deep neural networks (DNNs), of two peer nano-drones. To cope with the ultra-constrained nano-drone platform, we propose a vertically-integrated framework, including dataset augmentation, quantization, and system optimizations. Experimental results show that our DNN can precisely localize a 10cm-size target nano-drone by employing only low-resolution monochrome images, up to 2m distance.
arXiv Detail & Related papers (2023-03-03T14:14:08Z)
EdgeYOLO: An Edge-Real-Time Object Detector [69.41688769991482]
This paper proposes an efficient, low-complexity and anchor-free object detector based on the state-of-the-art YOLO framework. We develop an enhanced data augmentation method to effectively suppress overfitting during training, and design a hybrid random loss function to improve the detection accuracy of small objects. Our baseline model can reach the accuracy of 50.6% AP50:95 and 69.8% AP50 in MS 2017 dataset, 26.4% AP50:95 and 44.8% AP50 in VisDrone 2019-DET dataset, and it meets real-time requirements (FPS>=30) on edge-computing device Nvidia
arXiv Detail & Related papers (2023-02-15T06:05:14Z)
SALISA: Saliency-based Input Sampling for Efficient Video Object Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection. We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z)
Operationalizing Convolutional Neural Network Architectures for Prohibited Object Detection in X-Ray Imagery [15.694880385913534]
We explore the viability of two recent end-to-end object detection CNN architectures, Cascade R-CNN and FreeAnchor, for prohibited item detection. With fewer parameters and less training time, FreeAnchor achieves the highest detection inference speed of 13 fps (3.9 ms per image) The CNN models display substantial resilience to the lossy compression, resulting in only a 1.1% decrease in mAP at the JPEG compression level of 50.
arXiv Detail & Related papers (2021-10-10T21:20:04Z)
Small Object Detection Based on Modified FSSD and Model Compression [7.387639662781843]
This paper proposes a small object detection algorithm based on FSSD. In order to reduce the computational cost and storage space, pruning is carried out to achieve model compression. The average accuracy (mAP) of the algorithm can reach 80.4% on PASCAL VOC and the speed is 59.5 FPS on GTX1080ti.
arXiv Detail & Related papers (2021-08-24T03:20:32Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.