Parsing Birdsong with Deep Audio Embeddings
- URL: http://arxiv.org/abs/2108.09203v1
- Date: Fri, 20 Aug 2021 14:45:44 GMT
- Title: Parsing Birdsong with Deep Audio Embeddings
- Authors: Irina Tolkova, Brian Chu, Marcel Hedman, Stefan Kahl, Holger Klinck
- Abstract summary: We present a semi-supervised approach to identify characteristic calls and environmental noise.
We utilize several methods to learn a latent representation of audio samples, including a convolutional autoencoder and two pre-trained networks.
- Score: 0.5599792629509227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monitoring of bird populations has played a vital role in conservation
efforts and in understanding biodiversity loss. The automation of this process
has been facilitated by both sensing technologies, such as passive acoustic
monitoring, and accompanying analytical tools, such as deep learning. However,
machine learning models frequently have difficulty generalizing to examples not
encountered in the training data. In our work, we present a semi-supervised
approach to identify characteristic calls and environmental noise. We utilize
several methods to learn a latent representation of audio samples, including a
convolutional autoencoder and two pre-trained networks, and group the resulting
embeddings for a domain expert to identify cluster labels. We show that our
approach can improve classification precision and provide insight into the
latent structure of environmental acoustic datasets.
Related papers
- Heterogeneous sound classification with the Broad Sound Taxonomy and Dataset [6.91815289914328]
This paper explores methodologies for automatically classifying heterogeneous sounds characterized by high intra-class variability.
We construct a dataset through manual annotation to ensure accuracy, diverse representation within each class and relevance in real-world scenarios.
Experimental results illustrate that audio embeddings encoding acoustic and semantic information achieve higher accuracy in the classification task.
arXiv Detail & Related papers (2024-10-01T18:09:02Z) - Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics [2.6740633963478095]
We explore the effectiveness of transfer learning in large-scale bird sound classification.
Our experiments demonstrate that both fine-tuning and knowledge distillation yield strong performance.
We advocate for more comprehensive labeling practices within the animal sound community.
arXiv Detail & Related papers (2024-09-21T11:33:12Z) - GenCo: An Auxiliary Generator from Contrastive Learning for Enhanced
Few-Shot Learning in Remote Sensing [9.504503675097137]
We introduce a generator-based contrastive learning framework (GenCo) that pre-trains backbones and simultaneously explores variants of feature samples.
In fine-tuning, the auxiliary generator can be used to enrich limited labeled data samples in feature space.
We demonstrate the effectiveness of our method in improving few-shot learning performance on two key remote sensing datasets.
arXiv Detail & Related papers (2023-07-27T03:59:19Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Metric-based multimodal meta-learning for human movement identification
via footstep recognition [3.300376360949452]
We describe a novel metric-based learning approach that introduces a multimodal framework.
We learn general-purpose representations from low multisensory data obtained from omnipresent sensing systems.
Our results employ a metric-based contrastive learning approach for multi-sensor data to mitigate the impact of data scarcity.
arXiv Detail & Related papers (2021-11-15T18:46:14Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Recognizing bird species in diverse soundscapes under weak supervision [0.2148535041822524]
We present a robust classification approach for avian vocalization in complex and diverse soundscapes, achieving second place in the BirdCLEF 2021 challenge.
We illustrate how to make full use of pre-trained convolutional neural networks, by using an efficient modeling and training routine supplemented by novel augmentation methods.
arXiv Detail & Related papers (2021-07-16T06:54:38Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z) - Laplacian Denoising Autoencoder [114.21219514831343]
We propose to learn data representations with a novel type of denoising autoencoder.
The noisy input data is generated by corrupting latent clean data in the gradient domain.
Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach.
arXiv Detail & Related papers (2020-03-30T16:52:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.