WhaleNet: a Novel Deep Learning Architecture for Marine Mammals Vocalizations on Watkins Marine Mammal Sound Database
- URL: http://arxiv.org/abs/2402.17775v2
- Date: Wed, 26 Jun 2024 14:34:13 GMT
- Title: WhaleNet: a Novel Deep Learning Architecture for Marine Mammals Vocalizations on Watkins Marine Mammal Sound Database
- Authors: Alessandro Licciardi, Davide Carbone,
- Abstract summary: We introduce textbfWhaleNet (Wavelet Highly Adaptive Learning Ensemble Network), a sophisticated deep ensemble architecture for the classification of marine mammal vocalizations.
We achieve an improvement in classification accuracy by $8-10%$ over existing architectures, corresponding to a classification accuracy of $97.61%$.
- Score: 49.1574468325115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Marine mammal communication is a complex field, hindered by the diversity of vocalizations and environmental factors. The Watkins Marine Mammal Sound Database (WMMD) constitutes a comprehensive labeled dataset employed in machine learning applications. Nevertheless, the methodologies for data preparation, preprocessing, and classification documented in the literature exhibit considerable variability and are typically not applied to the dataset in its entirety. This study initially undertakes a concise review of the state-of-the-art benchmarks pertaining to the dataset, with a particular focus on clarifying data preparation and preprocessing techniques. Subsequently, we explore the utilization of the Wavelet Scattering Transform (WST) and Mel spectrogram as preprocessing mechanisms for feature extraction. In this paper, we introduce \textbf{WhaleNet} (Wavelet Highly Adaptive Learning Ensemble Network), a sophisticated deep ensemble architecture for the classification of marine mammal vocalizations, leveraging both WST and Mel spectrogram for enhanced feature discrimination. By integrating the insights derived from WST and Mel representations, we achieved an improvement in classification accuracy by $8-10\%$ over existing architectures, corresponding to a classification accuracy of $97.61\%$.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Advanced Framework for Animal Sound Classification With Features Optimization [35.2832738406242]
We propose an automated classification framework applicable to general animal sound classification.
Our approach consistently outperforms baseline methods by over 25% in precision, recall, and accuracy.
arXiv Detail & Related papers (2024-07-03T18:33:47Z) - Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics [2.1019401515721583]
We present the animal2vec framework, a fully interpretable transformer model and self-supervised training scheme tailored for sparse and unbalanced bioacoustic data.
We openly publish MeerKAT: Meerkat Kalahari Audio Transcripts, a large-scale dataset containing audio collected via biologgers on free-ranging meerkats with a length of over 1068h.
We report new state-of-the-art results on both datasets and evaluate the few-shot capabilities of animal2vec of labeled training data.
arXiv Detail & Related papers (2024-06-03T12:11:01Z) - All Thresholds Barred: Direct Estimation of Call Density in Bioacoustic
Data [1.7916003204531015]
We propose a validation scheme for estimating call density in a body of data.
We use these distributions to predict site-level densities, which may be subject to distribution shifts.
arXiv Detail & Related papers (2024-02-23T14:52:44Z) - Improving Primate Sounds Classification using Binary Presorting for Deep
Learning [6.044912425856236]
In this work, we introduce a generalized approach that first relabels subsegments of MEL spectrogram representations.
For both the binary pre-sorting and the classification, we make use of convolutional neural networks (CNN) and various data-augmentation techniques.
We showcase the results of this approach on the challenging textitComparE 2021 dataset, with the task of classifying between different primate species sounds.
arXiv Detail & Related papers (2023-06-28T09:35:09Z) - Image Labels Are All You Need for Coarse Seagrass Segmentation [3.253176232272777]
Seagrass meadows serve as critical carbon sinks, but estimating the amount of carbon they store requires knowledge of the seagrass species present.
Previous approaches for seagrass detection and classification have required supervision from patch-level labels.
We introduce SeaFeats, an architecture that uses unsupervised contrastive pre-training and feature similarity, and SeaCLIP, a model that showcases the effectiveness of large language models as a supervisory signal.
arXiv Detail & Related papers (2023-03-02T05:10:57Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier [68.38233199030908]
Long-tail recognition tackles the natural non-uniformly distributed data in realworld scenarios.
While moderns perform well on populated classes, its performance degrades significantly on tail classes.
Deep-RTC is proposed as a new solution to the long-tail problem, combining realism with hierarchical predictions.
arXiv Detail & Related papers (2020-07-20T05:57:42Z) - Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms.
We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.