Parsing Birdsong with Deep Audio Embeddings
- URL: http://arxiv.org/abs/2108.09203v1
- Date: Fri, 20 Aug 2021 14:45:44 GMT
- Title: Parsing Birdsong with Deep Audio Embeddings
- Authors: Irina Tolkova, Brian Chu, Marcel Hedman, Stefan Kahl, Holger Klinck
- Abstract summary: We present a semi-supervised approach to identify characteristic calls and environmental noise.
We utilize several methods to learn a latent representation of audio samples, including a convolutional autoencoder and two pre-trained networks.
- Score: 0.5599792629509227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monitoring of bird populations has played a vital role in conservation
efforts and in understanding biodiversity loss. The automation of this process
has been facilitated by both sensing technologies, such as passive acoustic
monitoring, and accompanying analytical tools, such as deep learning. However,
machine learning models frequently have difficulty generalizing to examples not
encountered in the training data. In our work, we present a semi-supervised
approach to identify characteristic calls and environmental noise. We utilize
several methods to learn a latent representation of audio samples, including a
convolutional autoencoder and two pre-trained networks, and group the resulting
embeddings for a domain expert to identify cluster labels. We show that our
approach can improve classification precision and provide insight into the
latent structure of environmental acoustic datasets.
Related papers
- Self-Supervised Learning for Few-Shot Bird Sound Classification [10.395255631261458]
Self-supervised learning (SSL) in audio holds significant potential across various domains.
In this study, we demonstrate that SSL is capable of acquiring meaningful representations of bird sounds from audio recordings without the need for annotations.
arXiv Detail & Related papers (2023-12-25T22:33:45Z) - GenCo: An Auxiliary Generator from Contrastive Learning for Enhanced
Few-Shot Learning in Remote Sensing [9.504503675097137]
We introduce a generator-based contrastive learning framework (GenCo) that pre-trains backbones and simultaneously explores variants of feature samples.
In fine-tuning, the auxiliary generator can be used to enrich limited labeled data samples in feature space.
We demonstrate the effectiveness of our method in improving few-shot learning performance on two key remote sensing datasets.
arXiv Detail & Related papers (2023-07-27T03:59:19Z) - Deep Active Learning in the Presence of Label Noise: A Survey [1.8945921149936182]
Deep active learning has emerged as a powerful tool for training deep learning models within a predefined labeling budget.
We discuss the current state of deep active learning in the presence of label noise, highlighting unique approaches, their strengths, and weaknesses.
We propose exploring contrastive learning methods to derive good image representations that can aid in selecting high-value samples for labeling.
arXiv Detail & Related papers (2023-02-22T00:27:39Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Metric-based multimodal meta-learning for human movement identification
via footstep recognition [3.300376360949452]
We describe a novel metric-based learning approach that introduces a multimodal framework.
We learn general-purpose representations from low multisensory data obtained from omnipresent sensing systems.
Our results employ a metric-based contrastive learning approach for multi-sensor data to mitigate the impact of data scarcity.
arXiv Detail & Related papers (2021-11-15T18:46:14Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing
Data [70.64030011999981]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Recognizing bird species in diverse soundscapes under weak supervision [0.2148535041822524]
We present a robust classification approach for avian vocalization in complex and diverse soundscapes, achieving second place in the BirdCLEF 2021 challenge.
We illustrate how to make full use of pre-trained convolutional neural networks, by using an efficient modeling and training routine supplemented by novel augmentation methods.
arXiv Detail & Related papers (2021-07-16T06:54:38Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z) - Laplacian Denoising Autoencoder [114.21219514831343]
We propose to learn data representations with a novel type of denoising autoencoder.
The noisy input data is generated by corrupting latent clean data in the gradient domain.
Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach.
arXiv Detail & Related papers (2020-03-30T16:52:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.