Multi-Representation Attention Framework for Underwater Bioacoustic Denoising and Recognition
- URL: http://arxiv.org/abs/2510.26838v1
- Date: Wed, 29 Oct 2025 22:49:15 GMT
- Title: Multi-Representation Attention Framework for Underwater Bioacoustic Denoising and Recognition
- Authors: Amine Razig, Youssef Soulaymani, Loubna Benabbou, Pierre Cauchy,
- Abstract summary: We introduce a multi-step, attention-guided framework that first segments spectrograms to generate soft masks of biologically relevant energy.<n>Image and mask embeddings are integrated via mid-level fusion, enabling the model to focus on salient spectrogram regions.<n>Using real-world recordings from the Saguenay St. Lawrence Marine Park Research Station in Canada, we demonstrate that segmentation-driven attention and mid-level fusion improve signal discrimination.
- Score: 0.924965746838578
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated monitoring of marine mammals in the St. Lawrence Estuary faces extreme challenges: calls span low-frequency moans to ultrasonic clicks, often overlap, and are embedded in variable anthropogenic and environmental noise. We introduce a multi-step, attention-guided framework that first segments spectrograms to generate soft masks of biologically relevant energy and then fuses these masks with the raw inputs for multi-band, denoised classification. Image and mask embeddings are integrated via mid-level fusion, enabling the model to focus on salient spectrogram regions while preserving global context. Using real-world recordings from the Saguenay St. Lawrence Marine Park Research Station in Canada, we demonstrate that segmentation-driven attention and mid-level fusion improve signal discrimination, reduce false positive detections, and produce reliable representations for operational marine mammal monitoring across diverse environmental conditions and signal-to-noise ratios. Beyond in-distribution evaluation, we further assess the generalization of Mask-Guided Classification (MGC) under distributional shifts by testing on spectrograms generated with alternative acoustic transformations. While high-capacity baseline models lose accuracy in this Out-of-distribution (OOD) setting, MGC maintains stable performance, with even simple fusion mechanisms (gated, concat) achieving comparable results across distributions. This robustness highlights the capacity of MGC to learn transferable representations rather than overfitting to a specific transformation, thereby reinforcing its suitability for large-scale, real-world biodiversity monitoring. We show that in all experimental settings, the MGC framework consistently outperforms baseline architectures, yielding substantial gains in accuracy on both in-distribution and OOD data.
Related papers
- Quality-Aware Robust Multi-View Clustering for Heterogeneous Observation Noise [12.720216418233795]
We propose a novel framework termed Quality-Aware Robust Multi-View Clustering (QARMVC)<n>QARMVC employs an information bottleneck mechanism to extract intrinsic semantics for view reconstruction.<n>In experiments on five benchmark datasets, QARMVC consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2026-02-26T03:16:44Z) - Out-of-Distribution Radar Detection with Complex VAEs: Theory, Whitening, and ANMF Fusion [5.205040944294552]
Complex-valued Variational AutoEncoder (CVAE) trained exclusively on clutter-plus-noise to perform Out-Of-Distribution detection.<n>We benchmark performance against classical and adaptive detectors.<n>Results demonstrate that statistical normalization combined with complex-valued generative modeling substantively improves detection in realistic sea-clutter conditions.
arXiv Detail & Related papers (2026-01-26T16:51:19Z) - Harmonizing the Deep: A Unified Information Pipeline for Robust Marine Biodiversity Assessment Across Heterogeneous Domains [0.769971486557519]
This work establishes the foundational detection layer for a multi-year invasive species monitoring initiative targeting Arctic and Atlantic marine ecosystems.<n>We develop a Unified Information Pipeline that standardises heterogeneous datasets into a comparable information flow.<n>We find that structural factors, such as scene composition, object density, and contextual redundancy, explain cross-domain performance loss.
arXiv Detail & Related papers (2026-01-20T13:51:55Z) - SKANet: A Cognitive Dual-Stream Framework with Adaptive Modality Fusion for Robust Compound GNSS Interference Classification [47.20483076887704]
Global Navigation Satellite Systems (GNSS) face growing threats from sophisticated jamming interference.<n>We propose a cognitive deep learning framework built upon a dual-stream architecture that integrates Time-Frequency Images (TFIs) and Power Spectral Density (PSD)<n>We show that SKANet achieves an overall accuracy of 96.99%, exhibiting superior robustness for compound jamming classification.
arXiv Detail & Related papers (2026-01-19T07:42:45Z) - Diffusion-Guided Mask-Consistent Paired Mixing for Endoscopic Image Segmentation [57.37991748282666]
We propose a paired, diffusion-guided paradigm that fuses the strengths of sample mixing and diffusion synthesis.<n>For each real image, a synthetic counterpart is generated under the same mask and the pair is used as a controllable input for Mask-Consistent Paired Mixing (MCPMix)<n>This produces a continuous family of intermediate samples that smoothly bridges synthetic and real appearances under shared geometry.
arXiv Detail & Related papers (2025-11-05T06:14:19Z) - WaveMAE: Wavelet decomposition Masked Auto-Encoder for Remote Sensing [5.65492058135409]
WaveMAE is a masked autoencoding framework tailored for multispectral satellite imagery.<n>To ensure fairness in evaluation, all methods are pretrained on the same dataset (fMoW-S2)<n>WaveMAE achieves consistent improvements over prior state-of-the-art approaches.
arXiv Detail & Related papers (2025-10-26T14:45:30Z) - Ecologically Valid Benchmarking and Adaptive Attention: Scalable Marine Bioacoustic Monitoring [2.558238597112103]
GetNetUPAM is a nested cross-validation framework to model stability under realistic variability.<n>Data are partitioned into distinct site-year segments, preserving recording and ensuring each validation fold reflects a unique environmental subset.<n>ARPA-N achieves a 14.4% gain in average precision over DenseNet baselines and a log2-scale order-of-magnitude drop in variability across all metrics.
arXiv Detail & Related papers (2025-09-04T22:03:05Z) - Combating Noisy Labels via Dynamic Connection Masking [31.78040205653134]
We propose a Dynamic Connection Masking (DCM) mechanism for both Multi-Layer Perceptron Networks (MLPs) and Kolmogorov-Arnold Networks (KANs)<n>Our approach can be seamlessly integrated into various noise-robust training methods to build more robust deep networks.
arXiv Detail & Related papers (2025-08-13T10:51:46Z) - Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z) - SARD: Segmentation-Aware Anomaly Synthesis via Region-Constrained Diffusion with Discriminative Mask Guidance [4.65786322515141]
We propose SARD (Segmentation-Aware anomaly synthesis via Region-constrained Diffusion with discriminative mask Guidance), a novel diffusion-based framework specifically designed for anomaly generation.<n>SARD surpasses existing methods in segmentation accuracy and visual quality, setting a new state-of-the-art for pixel-level anomaly synthesis.
arXiv Detail & Related papers (2025-08-05T06:43:01Z) - Generate Aligned Anomaly: Region-Guided Few-Shot Anomaly Image-Mask Pair Synthesis for Industrial Inspection [53.137651284042434]
Anomaly inspection plays a vital role in industrial manufacturing, but the scarcity of anomaly samples limits the effectiveness of existing methods.<n>We propose Generate grained Anomaly (GAA), a region-guided, few-shot anomaly image-mask pair generation framework.<n>GAA generates realistic, diverse, and semantically aligned anomalies using only a small number of samples.
arXiv Detail & Related papers (2025-07-13T12:56:59Z) - Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing [92.61216319417208]
We propose a novel frequency domain-based diffusion model, named ours, for fully exploiting the beneficial knowledge in unpaired clear data.<n>Inspired by the strong generative ability shown by Diffusion Models (DMs), we tackle the dehazing task from the perspective of frequency domain reconstruction.
arXiv Detail & Related papers (2025-07-02T01:22:46Z) - TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation [65.74990259650984]
We introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery.<n>Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism.<n>TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench.
arXiv Detail & Related papers (2025-06-06T17:59:50Z) - VRS-UIE: Value-Driven Reordering Scanning for Underwater Image Enhancement [104.78586859995333]
State Space Models (SSMs) have emerged as a promising backbone for vision tasks due to their linear complexity and global receptive field.<n>The predominance of large-portion, homogeneous but useless oceanic backgrounds can dilute the feature representation responses of sparse yet valuable targets.<n>We propose a novel Value-Driven Reordering Scanning framework for Underwater Image Enhancement (UIE)<n>Our framework sets a new state-of-the-art, delivering superior enhancement performance (surpassing WMamba by 0.89 dB on average) by effectively suppressing water bias and preserving structural and color fidelity.
arXiv Detail & Related papers (2025-05-02T12:21:44Z) - Calibrating Undisciplined Over-Smoothing in Transformer for Weakly Supervised Semantic Segmentation [51.14107156747967]
Weakly supervised semantic segmentation (WSSS) has attracted considerable attention because it requires fewer annotations than fully supervised approaches.<n>We propose an Adaptive Re-Activation Mechanism (AReAM) to control deep-level attention to undisciplined over-smoothing.<n>AReAM substantially improves segmentation performance compared with existing WSSS methods, reducing noise while sharpening focus on relevant semantic regions.
arXiv Detail & Related papers (2023-05-04T19:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.