A Sequential Self Teaching Approach for Improving Generalization in
Sound Event Recognition
- URL: http://arxiv.org/abs/2007.00144v1
- Date: Tue, 30 Jun 2020 22:53:43 GMT
- Title: A Sequential Self Teaching Approach for Improving Generalization in
Sound Event Recognition
- Authors: Anurag Kumar, Vamsi Krishna Ithapu
- Abstract summary: We propose a sequential self-teaching approach to learning sounds.
It is harder to learn sounds in adverse situations such as from weakly labeled and/or noisy labeled data.
Our proposal is a sequential stage-wise learning process that improves generalization capabilities of a given modeling system.
- Score: 11.559570255513217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An important problem in machine auditory perception is to recognize and
detect sound events. In this paper, we propose a sequential self-teaching
approach to learning sounds. Our main proposition is that it is harder to learn
sounds in adverse situations such as from weakly labeled and/or noisy labeled
data, and in these situations a single stage of learning is not sufficient. Our
proposal is a sequential stage-wise learning process that improves
generalization capabilities of a given modeling system. We justify this method
via technical results and on Audioset, the largest sound events dataset, our
sequential learning approach can lead to up to 9% improvement in performance. A
comprehensive evaluation also shows that the method leads to improved
transferability of knowledge from previously trained models, thereby leading to
improved generalization capabilities on transfer learning tasks.
Related papers
- Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Audio Contrastive based Fine-tuning [21.145936249583446]
We introduce Audio Contrastive-based Fine-tuning (AudioConFit) as an efficient approach characterised by robust generalisability.
Empirical experiments on a variety of audio classification tasks demonstrate the effectiveness and robustness of our approach.
arXiv Detail & Related papers (2023-09-21T08:59:13Z) - Pretraining Representations for Bioacoustic Few-shot Detection using
Supervised Contrastive Learning [10.395255631261458]
In bioacoustic applications, most tasks come with few labelled training data, because annotating long recordings is time consuming and costly.
We show that learning a rich feature extractor from scratch can be achieved by leveraging data augmentation using a supervised contrastive learning framework.
We obtain an F-score of 63.46% on the validation set and 42.7% on the test set, ranking second in the DCASE challenge.
arXiv Detail & Related papers (2023-09-02T09:38:55Z) - Improving Natural-Language-based Audio Retrieval with Transfer Learning
and Audio & Text Augmentations [7.817685358710508]
We propose a system to project recordings and textual descriptions into a shared audio-caption space.
Our results show that the used augmentations strategies reduce overfitting and improve retrieval performance.
We further show that pre-training the system on the AudioCaps dataset leads to additional improvements.
arXiv Detail & Related papers (2022-08-24T11:54:42Z) - Label-Efficient Self-Supervised Speaker Verification With Information
Maximization and Contrastive Learning [0.0]
We explore self-supervised learning for speaker verification by learning representations directly from raw audio.
Our approach is based on recent information learning frameworks and an intensive data pre-processing step.
arXiv Detail & Related papers (2022-07-12T13:01:55Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - UniSpeech-SAT: Universal Speech Representation Learning with Speaker
Aware Pre-Training [72.004873454347]
Two methods are introduced for enhancing the unsupervised speaker information extraction.
Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance.
We scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement.
arXiv Detail & Related papers (2021-10-12T05:43:30Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Cross-Referencing Self-Training Network for Sound Event Detection in
Audio Mixtures [23.568610919253352]
This paper proposes a semi-supervised method for generating pseudo-labels from unsupervised data using a student-teacher scheme that balances self-training and cross-training.
The results of these methods on both "validation" and "public evaluation" sets of DESED database show significant improvement compared to the state-of-the art systems in semi-supervised learning.
arXiv Detail & Related papers (2021-05-27T18:46:59Z) - Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z) - Learning Not to Learn in the Presence of Noisy Labels [104.7655376309784]
We show that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption.
We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels.
arXiv Detail & Related papers (2020-02-16T09:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.