Decodable but not structured: linear probing enables Underwater Acoustic Target Recognition with pretrained audio embeddings
- URL: http://arxiv.org/abs/2601.08358v1
- Date: Tue, 13 Jan 2026 09:15:31 GMT
- Title: Decodable but not structured: linear probing enables Underwater Acoustic Target Recognition with pretrained audio embeddings
- Authors: Hilde I. Hummel, Sandjai Bhulai, Rob D. van der Mei, Burooj Ghani,
- Abstract summary: anthropogenic noise from ships contribute significantly to underwater sound pollution, posing risks to marine ecosystems.<n> Passive Acoustic Monitoring (PAM) systems are widely deployed for this purpose, generating years of underwater recordings across diverse soundscapes.<n>Recent advances in automatic Underwater Acoustic Target Recognition (UATR) have largely relied on supervised learning, which is constrained by the scarcity of labeled data.<n>In this work, we conduct the first empirical comparative study of transfer learning for UATR, evaluating multiple pretrained audio models originating from diverse audio domains.
- Score: 1.25052154879199
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Increasing levels of anthropogenic noise from ships contribute significantly to underwater sound pollution, posing risks to marine ecosystems. This makes monitoring crucial to understand and quantify the impact of the ship radiated noise. Passive Acoustic Monitoring (PAM) systems are widely deployed for this purpose, generating years of underwater recordings across diverse soundscapes. Manual analysis of such large-scale data is impractical, motivating the need for automated approaches based on machine learning. Recent advances in automatic Underwater Acoustic Target Recognition (UATR) have largely relied on supervised learning, which is constrained by the scarcity of labeled data. Transfer Learning (TL) offers a promising alternative to mitigate this limitation. In this work, we conduct the first empirical comparative study of transfer learning for UATR, evaluating multiple pretrained audio models originating from diverse audio domains. The pretrained model weights are frozen, and the resulting embeddings are analyzed through classification, clustering, and similarity-based evaluations. The analysis shows that the geometrical structure of the embedding space is largely dominated by recording-specific characteristics. However, a simple linear probe can effectively suppress this recording-specific information and isolate ship-type features from these embeddings. As a result, linear probing enables effective automatic UATR using pretrained audio models at low computational cost, significantly reducing the need for a large amounts of high-quality labeled ship recordings.
Related papers
- Learning Robust Spatial Representations from Binaural Audio through Feature Distillation [64.36563387033921]
We investigate the use of a pretraining stage based on feature distillation to learn a robust spatial representation of speech without the need for data labels.<n>Our experiments demonstrate that the pretrained models show improved performance in noisy and reverberant environments.
arXiv Detail & Related papers (2025-08-28T15:43:15Z) - Automated data curation for self-supervised learning in underwater acoustic analysis [0.6990493129893112]
The sustainability of the ocean ecosystem is threatened by increased levels of sound pollution.<n> Passive acoustic monitoring (PAM) systems collect a large amount of underwater sound recordings.<n>Although machine learning offers a potential solution, most underwater acoustic recordings are unlabeled.
arXiv Detail & Related papers (2025-05-26T14:50:04Z) - AquaSignal: An Integrated Framework for Robust Underwater Acoustic Analysis [0.0]
AquaSignal is a modular and scalable pipeline for preprocessing, denoising, classification, and novelty detection of underwater acoustic signals.<n>System is evaluated on a combined dataset from the Deepship and Ocean Networks Canada (ONC) benchmarks.
arXiv Detail & Related papers (2025-05-20T12:35:43Z) - The Computation of Generalized Embeddings for Underwater Acoustic Target Recognition using Contrastive Learning [0.7145837421668514]
Sound pollution in marine environments poses an increased threat to ocean health.<n>By monitoring this noise, the sources responsible for this pollution can be mapped.<n>This generates a large amount of data records, capturing a mix of sound sources such as ship activities and marine mammal vocalizations.
arXiv Detail & Related papers (2025-05-19T09:37:46Z) - Neural Edge Histogram Descriptors for Underwater Acoustic Target Recognition [42.23422932643755]
This work adapts the neural edge histogram descriptors (NEHD) method originally developed for image classification, to classify passive sonar signals.<n>We conduct a comprehensive evaluation of statistical and structural texture features, demonstrating that their combination achieves competitive performance with large pre-trained models.<n>The proposed NEHD-based approach offers a lightweight and efficient solution for underwater target recognition, significantly reducing computational costs while maintaining accuracy.
arXiv Detail & Related papers (2025-03-17T22:57:05Z) - Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Cross-domain Sound Recognition for Efficient Underwater Data Analysis [4.373836150479923]
This paper presents a novel deep learning approach for analyzing massive underwater acoustic data by leveraging a model trained on a broad spectrum of non-underwater (aerial) sounds.
We use PCA and UMAP visualization to cluster the data in a two dimensional space and listen to points within these clusters to understand their defining characteristics.
In the second part, we train a neural network model using both the selected underwater data and the non-underwater dataset.
arXiv Detail & Related papers (2023-09-07T02:26:32Z) - Histogram Layer Time Delay Neural Networks for Passive Sonar
Classification [58.720142291102135]
A novel method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification.
The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition.
arXiv Detail & Related papers (2023-07-25T19:47:26Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - Capturing scattered discriminative information using a deep architecture
in acoustic scene classification [49.86640645460706]
In this study, we investigate various methods to capture discriminative information and simultaneously mitigate the overfitting problem.
We adopt a max feature map method to replace conventional non-linear activations in a deep neural network.
Two data augment methods and two deep architecture modules are further explored to reduce overfitting and sustain the system's discriminative power.
arXiv Detail & Related papers (2020-07-09T08:32:06Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.