Spectrum Correction: Acoustic Scene Classification with Mismatched
Recording Devices
- URL: http://arxiv.org/abs/2105.11856v1
- Date: Tue, 25 May 2021 11:53:17 GMT
- Title: Spectrum Correction: Acoustic Scene Classification with Mismatched
Recording Devices
- Authors: Micha{\l} Ko\'smider
- Abstract summary: Machine learning algorithms, when trained on audio recordings from a limited set of devices, may not generalize well to samples recorded using other devices with different frequency responses.
In this work, a relatively straightforward method is introduced to address this problem.
Two variants of the approach are presented. First requires aligned examples from multiple devices, the second approach alleviates this requirement.
- Score: 9.404066316241051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning algorithms, when trained on audio recordings from a limited
set of devices, may not generalize well to samples recorded using other devices
with different frequency responses. In this work, a relatively straightforward
method is introduced to address this problem. Two variants of the approach are
presented. First requires aligned examples from multiple devices, the second
approach alleviates this requirement. This method works for both time and
frequency domain representations of audio recordings. Further, a relation to
standardization and Cepstral Mean Subtraction is analysed. The proposed
approach becomes effective even when very few examples are provided. This
method was developed during the Detection and Classification of Acoustic Scenes
and Events (DCASE) 2019 challenge and won the 1st place in the scenario with
mis-matched recording devices with the accuracy of 75%. Source code for the
experiments can be found online.
Related papers
- DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Towards Unconstrained Audio Splicing Detection and Localization with Neural Networks [6.570712059945705]
Convincing forgeries can be created by combining various speech samples from the same person.
Most existing detection algorithms for audio splicing use handcrafted features and make specific assumptions.
We propose a Transformer sequence-to-sequence (seq2seq) network for splicing detection and localization.
arXiv Detail & Related papers (2022-07-29T13:57:16Z) - Learning Phone Recognition from Unpaired Audio and Phone Sequences Based
on Generative Adversarial Network [58.82343017711883]
This paper investigates how to learn directly from unpaired phone sequences and speech utterances.
GAN training is adopted in the first stage to find the mapping relationship between unpaired speech and phone sequence.
In the second stage, another HMM model is introduced to train from the generator's output, which boosts the performance.
arXiv Detail & Related papers (2022-07-29T09:29:28Z) - Learning to Adapt to Domain Shifts with Few-shot Samples in Anomalous
Sound Detection [7.631596468553607]
Anomaly detection has many important applications, such as monitoring industrial equipment.
We propose a framework that adapts to new conditions with few-shot samples.
We evaluate our proposed method on a recently-released dataset of audio measurements from different machine types.
arXiv Detail & Related papers (2022-04-05T00:22:25Z) - Task 1A DCASE 2021: Acoustic Scene Classification with mismatch-devices
using squeeze-excitation technique and low-complexity constraint [4.4973334555746]
Acoustic scene classification (ASC) is one of the most popular problems in the field of machine listening.
The subtask presented in this report corresponds to a ASC problem that is constrained by the complexity of the model.
Specifically, a system based on two steps is proposed: a two-dimensional representation of the audio using the Gamamtone filter bank and a convolutional neural network.
arXiv Detail & Related papers (2021-07-30T14:24:45Z) - Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic
Speech Synthesis [59.623780036359655]
Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators.
This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury.
We propose a solution to this problem based on the theory of multi-view learning.
arXiv Detail & Related papers (2020-12-30T15:09:02Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z) - Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components.
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.