Improving Post-Processing of Audio Event Detectors Using Reinforcement
Learning
- URL: http://arxiv.org/abs/2208.09201v1
- Date: Fri, 19 Aug 2022 08:00:26 GMT
- Title: Improving Post-Processing of Audio Event Detectors Using Reinforcement
Learning
- Authors: Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis
- Abstract summary: We employ reinforcement learning to jointly discover the optimal parameters for various stages of a post-processing stack.
We find that we can improve the audio event-based macro F1-score by 4-5% compared to using the same post-processing stack with manually tuned parameters.
- Score: 5.758073912084364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We apply post-processing to the class probability distribution outputs of
audio event classification models and employ reinforcement learning to jointly
discover the optimal parameters for various stages of a post-processing stack,
such as the classification thresholds and the kernel sizes of median filtering
algorithms used to smooth out model predictions. To achieve this we define a
reinforcement learning environment where: 1) a state is the class probability
distribution provided by the model for a given audio sample, 2) an action is
the choice of a candidate optimal value for each parameter of the
post-processing stack, 3) the reward is based on the classification accuracy
metric we aim to optimize, which is the audio event-based macro F1-score in our
case. We apply our post-processing to the class probability distribution
outputs of two audio event classification models submitted to the DCASE Task4
2020 challenge. We find that by using reinforcement learning to discover the
optimal per-class parameters for the post-processing stack that is applied to
the outputs of audio event classification models, we can improve the audio
event-based macro F1-score (the main metric used in the DCASE challenge to
compare audio event classification accuracy) by 4-5% compared to using the same
post-processing stack with manually tuned parameters.
Related papers
- D4AM: A General Denoising Framework for Downstream Acoustic Models [45.04967351760919]
Speech enhancement (SE) can be used as a front-end strategy to aid automatic speech recognition (ASR) systems.
Existing training objectives of SE methods are not fully effective at integrating speech-text and noisy-clean paired data for training toward unseen ASR systems.
We propose a general denoising framework, D4AM, for various downstream acoustic models.
arXiv Detail & Related papers (2023-11-28T08:27:27Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Segment-level Metric Learning for Few-shot Bioacoustic Event Detection [56.59107110017436]
We propose a segment-level few-shot learning framework that utilizes both the positive and negative events during model optimization.
Our system achieves an F-measure of 62.73 on the DCASE 2022 challenge task 5 (DCASE2022-T5) validation set, outperforming the performance of the baseline prototypical network 34.02 by a large margin.
arXiv Detail & Related papers (2022-07-15T22:41:30Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - Environmental sound analysis with mixup based multitask learning and
cross-task fusion [0.12891210250935145]
acoustic scene classification and acoustic event classification are two closely related tasks.
In this letter, a two-stage method is proposed for the above tasks.
The proposed method has confirmed the complementary characteristics of acoustic scene and acoustic event classifications.
arXiv Detail & Related papers (2021-03-30T05:11:53Z) - PSLA: Improving Audio Event Classification with Pretraining, Sampling,
Labeling, and Aggregation [19.09439093130855]
We present PSLA, a collection of training techniques that can noticeably boost the model accuracy.
We obtain a model that achieves a new state-of-the-art mean average precision (mAP) of 0.474 on AudioSet, outperforming the previous best system of 0.439.
arXiv Detail & Related papers (2021-02-02T01:00:38Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z) - Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network.
We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z) - Active Learning for Sound Event Detection [18.750572243562576]
This paper proposes an active learning system for sound event detection (SED)
It aims at maximizing the accuracy of a learned SED model with limited annotation effort.
Remarkably, the required annotation effort can be greatly reduced on the dataset where target sound events are rare.
arXiv Detail & Related papers (2020-02-12T14:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.