Environmental sound analysis with mixup based multitask learning and
cross-task fusion
- URL: http://arxiv.org/abs/2103.16079v1
- Date: Tue, 30 Mar 2021 05:11:53 GMT
- Title: Environmental sound analysis with mixup based multitask learning and
cross-task fusion
- Authors: Weiping Zheng, Dacan Jiang, Gansen Zhao
- Abstract summary: acoustic scene classification and acoustic event classification are two closely related tasks.
In this letter, a two-stage method is proposed for the above tasks.
The proposed method has confirmed the complementary characteristics of acoustic scene and acoustic event classifications.
- Score: 0.12891210250935145
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Environmental sound analysis is currently getting more and more attentions.
In the domain, acoustic scene classification and acoustic event classification
are two closely related tasks. In this letter, a two-stage method is proposed
for the above tasks. In the first stage, a mixup based MTL solution is proposed
to classify both tasks in one single convolutional neural network. Artificial
multi-label samples are used in the training of the MTL model, which are mixed
up using existing single-task datasets. The multi-task model obtained can
effectively recognize both the acoustic scenes and events. Compared with other
methods such as re-annotation or synthesis, the mixup based MTL is low-cost,
flexible and effective. In the second stage, the MTL model is modified into a
single-task model which is fine-tuned using the original dataset corresponding
to the specific task. By controlling the frozen layers carefully, the
task-specific high level features are fused and the performance of the single
classification task is further improved. The proposed method has confirmed the
complementary characteristics of acoustic scene and acoustic event
classifications. Finally, enhanced by ensemble learning, a satisfactory
accuracy of 84.5 percent on TUT acoustic scene 2017 dataset and an accuracy of
77.5 percent on ESC-50 dataset are achieved respectively.
Related papers
- Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance.
We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features.
In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z) - Sequence-to-sequence models in peer-to-peer learning: A practical application [0.0]
The paper explores the applicability of sequence-to-sequence (Seq2Seq) models based on LSTM units for Automatic Speech Recognition (ASR) task within peer-to-peer learning environments.
The findings demonstrate the feasibility of employing Seq2Seq models in decentralized settings.
arXiv Detail & Related papers (2024-05-02T14:44:06Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Segment-level Metric Learning for Few-shot Bioacoustic Event Detection [56.59107110017436]
We propose a segment-level few-shot learning framework that utilizes both the positive and negative events during model optimization.
Our system achieves an F-measure of 62.73 on the DCASE 2022 challenge task 5 (DCASE2022-T5) validation set, outperforming the performance of the baseline prototypical network 34.02 by a large margin.
arXiv Detail & Related papers (2022-07-15T22:41:30Z) - Adaptive Few-Shot Learning Algorithm for Rare Sound Event Detection [24.385226516231004]
We propose a novel task-adaptive module which is easy to plant into any metric-based few-shot learning frameworks.
Our module improves the performance considerably on two datasets over baseline methods.
arXiv Detail & Related papers (2022-05-24T03:13:12Z) - A Complementary Joint Training Approach Using Unpaired Speech and Text
for Low-Resource Automatic Speech Recognition [25.473191378558138]
We leverage unpaired data to train a general sequence-to-sequence model.
Inspired by the complementarity of speech-PseudoLabel pair and SynthesizedAudio-text pair, we propose a complementary joint training(CJT) method.
arXiv Detail & Related papers (2022-04-05T07:02:53Z) - Neural Task Success Classifiers for Robotic Manipulation from Few Real
Demonstrations [1.7205106391379026]
This paper presents a novel classifier that learns to classify task completion only from a few demonstrations.
We compare different neural classifiers, e.g. fully connected-based, fully convolutional-based, sequence2sequence-based, and domain adaptation-based classification.
Our model, i.e. fully convolutional neural network with domain adaptation and timing features, achieves an average classification accuracy of 97.3% and 95.5% across tasks.
arXiv Detail & Related papers (2021-07-01T19:58:16Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z) - Low Resource Multi-Task Sequence Tagging -- Revisiting Dynamic
Conditional Random Fields [67.51177964010967]
We compare different models for low resource multi-task sequence tagging that leverage dependencies between label sequences for different tasks.
We find that explicit modeling of inter-dependencies between task predictions outperforms single-task as well as standard multi-task models.
arXiv Detail & Related papers (2020-05-01T07:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.