Self-Supervised Compression and Artifact Correction for Streaming Underwater Imaging Sonar
- URL: http://arxiv.org/abs/2511.13922v1
- Date: Mon, 17 Nov 2025 21:19:15 GMT
- Title: Self-Supervised Compression and Artifact Correction for Streaming Underwater Imaging Sonar
- Authors: Rongsheng Qian, Chi Xu, Xiaoqiang Ma, Hao Fang, Yili Jin, William I. Atlas, Jiangchuan Liu,
- Abstract summary: Real-time imaging sonar has become an important tool for underwater monitoring in environments where optical sensing is unreliable.<n>We present SCOPE, a self-supervised framework that jointly performs compression and artifact correction without clean-noise pairs or synthetic assumptions.<n>SCOPE has been deployed for months in three Pacific Northwest rivers to support real-time salmon enumeration and environmental monitoring in the wild.
- Score: 14.023965177100239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time imaging sonar has become an important tool for underwater monitoring in environments where optical sensing is unreliable. Its broader use is constrained by two coupled challenges: highly limited uplink bandwidth and severe sonar-specific artifacts (speckle, motion blur, reverberation, acoustic shadows) that affect up to 98% of frames. We present SCOPE, a self-supervised framework that jointly performs compression and artifact correction without clean-noise pairs or synthetic assumptions. SCOPE combines (i) Adaptive Codebook Compression (ACC), which learns frequency-encoded latent representations tailored to sonar, with (ii) Frequency-Aware Multiscale Segmentation (FAMS), which decomposes frames into low-frequency structure and sparse high-frequency dynamics while suppressing rapidly fluctuating artifacts. A hedging training strategy further guides frequency-aware learning using low-pass proxy pairs generated without labels. Evaluated on months of in-situ ARIS sonar data, SCOPE achieves a structural similarity index (SSIM) of 0.77, representing a 40% improvement over prior self-supervised denoising baselines, at bitrates down to <= 0.0118 bpp. It reduces uplink bandwidth by more than 80% while improving downstream detection. The system runs in real time, with 3.1 ms encoding on an embedded GPU and 97 ms full multi-layer decoding on the server end. SCOPE has been deployed for months in three Pacific Northwest rivers to support real-time salmon enumeration and environmental monitoring in the wild. Results demonstrate that learning frequency-structured latents enables practical, low-bitrate sonar streaming with preserved signal details under real-world deployment conditions.
Related papers
- Latent-Mark: An Audio Watermark Robust to Neural Resynthesis [62.09761127079914]
Latent-Mark is the first zero-bit audio watermarking framework designed to survive semantic compression.<n>Our key insight is that robustness to the encode-decode process requires embedding the watermark within the invariant latent space.<n>Our work inspires future research into universal watermarking frameworks capable of maintaining integrity across increasingly complex and diverse generative distortions.
arXiv Detail & Related papers (2026-03-05T15:51:09Z) - Denoising and Baseline Correction of Low-Scan FTIR Spectra: A Benchmark of Deep Learning Models Against Traditional Signal Processing [0.0]
We propose a physics-informed cascade Unet that separates denoising and baseline correction tasks.<n>This architecture forces the network to separate random noise from chemical signals using an embedded SNIP layer.<n>We benchmarked this approach against a standard single Unet and a traditional Savitzky-Golay/SNIP workflow.
arXiv Detail & Related papers (2026-01-28T15:19:02Z) - Efficient On-Board Processing of Oblique UAV Video for Rapid Flood Extent Mapping [7.460695517551536]
Temporal Token Reuse (TTR) is an adaptive inference framework capable of accelerating video segmentation on embedded devices.<n>We show that TTR achieves a 30% reduction in inference latency with negligible degradation in segmentation accuracy ( 0.5% mIoU)<n>These findings confirm that TTR effectively shifts the operational frontier, enabling high-fidelity, real-time oblique video understanding for time-critical remote sensing missions.
arXiv Detail & Related papers (2026-01-16T13:41:56Z) - FLaTEC: Frequency-Disentangled Latent Triplanes for Efficient Compression of LiDAR Point Clouds [52.997038111673966]
FLaTEC is a frequency-aware compression model that enables the compression of a full scan with high compression ratios.<n>We convert voxelized embeddings into triplane representations to reduce sparsity, computational cost, and storage requirements.<n>Our method achieves state-of-the-art rate-distortion performance and outperforms the standard codecs by 78% and 94% in BD-rate on both datasets.
arXiv Detail & Related papers (2025-11-25T08:37:49Z) - Towards Frequency-Adaptive Learning for SAR Despeckling [10.764049665817629]
We propose a frequency-adaptive heterogeneous despeckling model based on a divide-and-conquer architecture.<n>Inspired by their differing noise characteristics, we design specialized sub-networks for different frequency components.<n>For high-frequency sub-bands rich in edges and textures, we introduce an enhanced U-Net with deformable convolutions for noise suppression and enhanced features.
arXiv Detail & Related papers (2025-11-08T07:08:22Z) - Real-time Noise Detection and Classification in Single-Channel EEG: A Lightweight Machine Learning Approach for EMG, White Noise, and EOG Artifacts [0.0]
We propose a hybrid spectral-temporal framework for real-time detection and classification of ocular (EOG), muscular (EMG), and white noise artifacts in single-channel EEG.<n>With 30-second training times (97% faster than CNNs) and robust performance across SNR levels, this framework bridges the gap between clinical applicability and computational efficiency.<n>This work also challenges the ubiquitous dependence on model depth for EEG artifact detection by demonstrating that domain-informed feature fusion surpasses complex architecture in noisy scenarios.
arXiv Detail & Related papers (2025-09-30T10:32:38Z) - Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising [1.912429179274357]
Ultrasound Coherent Plane Wave Compounding (CPWC) enhances image contrast by combining echoes from multiple steered transmissions.<n>We propose a zero-shot denoising framework tailored for low-angle CPWC acquisitions.
arXiv Detail & Related papers (2025-06-26T17:28:32Z) - SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams [70.9610707466343]
Bio-inspired spike cameras emit asynchronous events at microsecond-level resolution, providing an alternative sensing modality.<n>Existing methods lack specialized stereo algorithms and benchmarks tailored to the spike data.<n>We propose SpikeStereoNet, a brain-inspired framework and the first to estimate stereo depth directly from raw spike streams.
arXiv Detail & Related papers (2025-05-26T04:14:34Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [92.4205087439928]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose the Self-supervised Transfer (PST) and the FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models, effectively mitigating data scarcity.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.<n>This combined approach enables FUSE to construct a universal image-event that only requires lightweight decoder adaptation for target datasets.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience.
Existing methods have achieved great success by employing advanced motion models and synthesis networks.
WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z) - Thunder: Thumbnail based Fast Lightweight Image Denoising Network [92.9631117239565]
A textbfThumbtextbfnail based textbfDtextbfenoising Netwotextbfrk dubbed Thunder is proposed.
arXiv Detail & Related papers (2022-05-24T06:38:46Z) - Exploring Inter-frequency Guidance of Image for Lightweight Gaussian
Denoising [1.52292571922932]
We propose a novel network architecture denoted as IGNet, in order to refine the frequency bands from low to high in a progressive manner.
With this design, more inter-frequency prior and information are utilized, thus the model size can be lightened while still perserves competitive results.
arXiv Detail & Related papers (2021-12-22T10:35:53Z) - Conditioning Trick for Training Stable GANs [70.15099665710336]
We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training.
We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
arXiv Detail & Related papers (2020-10-12T16:50:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.