Self-Supervised Enhancement of Forward-Looking Sonar Images: Bridging Cross-Modal Degradation Gaps through Feature Space Transformation and Multi-Frame Fusion
- URL: http://arxiv.org/abs/2504.10974v2
- Date: Wed, 16 Apr 2025 15:58:55 GMT
- Title: Self-Supervised Enhancement of Forward-Looking Sonar Images: Bridging Cross-Modal Degradation Gaps through Feature Space Transformation and Multi-Frame Fusion
- Authors: Zhisheng Zhang, Peng Zhang, Fengxiang Wang, Liangli Ma, Fuchun Sun,
- Abstract summary: Enhancing forward-looking sonar images is critical for accurate underwater target detection.<n>We propose a feature-space transformation that maps sonar images from the pixel domain to a robust feature domain.<n>Our method significantly outperforms existing approaches, effectively suppressing noise, preserving detailed edges, and substantially improving brightness.
- Score: 17.384482405769567
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Enhancing forward-looking sonar images is critical for accurate underwater target detection. Current deep learning methods mainly rely on supervised training with simulated data, but the difficulty in obtaining high-quality real-world paired data limits their practical use and generalization. Although self-supervised approaches from remote sensing partially alleviate data shortages, they neglect the cross-modal degradation gap between sonar and remote sensing images. Directly transferring pretrained weights often leads to overly smooth sonar images, detail loss, and insufficient brightness. To address this, we propose a feature-space transformation that maps sonar images from the pixel domain to a robust feature domain, effectively bridging the degradation gap. Additionally, our self-supervised multi-frame fusion strategy leverages complementary inter-frame information to naturally remove speckle noise and enhance target-region brightness. Experiments on three self-collected real-world forward-looking sonar datasets show that our method significantly outperforms existing approaches, effectively suppressing noise, preserving detailed edges, and substantially improving brightness, demonstrating strong potential for underwater target detection applications.
Related papers
- AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis [57.249817395828174]
We propose a scalable framework combining pseudo-synthetic renderings from 3D city-wide meshes with real, ground-level crowd-sourced images.
The pseudo-synthetic data simulates a wide range of aerial viewpoints, while the real, crowd-sourced images help improve visual fidelity for ground-level images.
Using this hybrid dataset, we fine-tune several state-of-the-art algorithms and achieve significant improvements on real-world, zero-shot aerial-ground tasks.
arXiv Detail & Related papers (2025-04-17T17:57:05Z) - ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring [61.82010103478833]
We develop a context-based local blur detection module that incorporates additional contextual information to improve the identification of blurry regions.<n>Considering that modern smartphones are equipped with cameras capable of providing short-exposure images, we develop a blur-aware guided image restoration method.<n>We formulate the above components into a simple yet effective network, named ExpRDiff.
arXiv Detail & Related papers (2024-12-12T11:42:39Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - WTCL-Dehaze: Rethinking Real-world Image Dehazing via Wavelet Transform and Contrastive Learning [17.129068060454255]
Single image dehazing is essential for applications such as autonomous driving and surveillance.
We propose an enhanced semi-supervised dehazing network that integrates Contrastive Loss and Discrete Wavelet Transform.
Our proposed algorithm achieves superior performance and improved robustness compared to state-of-the-art single image dehazing methods.
arXiv Detail & Related papers (2024-10-07T05:36:11Z) - Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration [64.84134880709625]
We show that it is possible to perform domain adaptation via the noise space using diffusion models.<n>In particular, by leveraging the unique property of how auxiliary conditional inputs influence the multi-step denoising process, we derive a meaningful diffusion loss.<n>We present crucial strategies such as channel-shuffling layer and residual-swapping contrastive learning in the diffusion model.
arXiv Detail & Related papers (2024-06-26T17:40:30Z) - Inhomogeneous illumination image enhancement under ex-tremely low visibility condition [3.534798835599242]
Imaging through dense fog presents unique challenges, with essential visual information crucial for applications like object detection and recognition obscured, thereby hindering conventional image processing methods.
We introduce in this paper a novel method that adaptively filters background illumination based on Structural Differential and Integral Filtering (F) to enhance only vital signal information.
Our findings demonstrate that our proposed method significantly enhances signal clarity under extremely low visibility conditions and out-performs existing techniques, offering substantial improvements for deep fog imaging applications.
arXiv Detail & Related papers (2024-04-26T16:09:42Z) - Improving Lens Flare Removal with General Purpose Pipeline and Multiple
Light Sources Recovery [69.71080926778413]
flare artifacts can affect image visual quality and downstream computer vision tasks.
Current methods do not consider automatic exposure and tone mapping in image signal processing pipeline.
We propose a solution to improve the performance of lens flare removal by revisiting the ISP and design a more reliable light sources recovery strategy.
arXiv Detail & Related papers (2023-08-31T04:58:17Z) - Learning Heavily-Degraded Prior for Underwater Object Detection [59.5084433933765]
This paper seeks transferable prior knowledge from detector-friendly images.
It is based on statistical observations that, the heavily degraded regions of detector-friendly (DFUI) and underwater images have evident feature distribution gaps.
Our method with higher speeds and less parameters still performs better than transformer-based detectors.
arXiv Detail & Related papers (2023-08-24T12:32:46Z) - Weakly Supervised Face and Whole Body Recognition in Turbulent
Environments [2.2263723609685773]
We propose a new weakly supervised framework that generates domain representations, aligning turbulent and pristine images into a common subspace.
We also introduce a new tilt map estimator that predicts geometric distortions observed in turbulent images.
Our method does not require synthesizing turbulent-free images or ground-truth paired images, and requires significantly fewer annotated samples.
arXiv Detail & Related papers (2023-08-22T19:58:02Z) - Multi-Frequency-Aware Patch Adversarial Learning for Neural Point Cloud
Rendering [7.522462414919854]
We present a neural point cloud rendering pipeline through a novel multi-frequency-aware patch adversarial learning framework.
The proposed approach aims to improve the rendering realness by minimizing the spectrum discrepancy between real and synthesized images.
Our method produces state-of-the-art results for neural point cloud rendering by a significant margin.
arXiv Detail & Related papers (2022-10-07T16:54:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.