Relational Teacher Student Learning with Neural Label Embedding for
Device Adaptation in Acoustic Scene Classification
- URL: http://arxiv.org/abs/2008.00110v1
- Date: Fri, 31 Jul 2020 23:07:20 GMT
- Title: Relational Teacher Student Learning with Neural Label Embedding for
Device Adaptation in Acoustic Scene Classification
- Authors: Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee
- Abstract summary: We propose a domain adaptation framework to address the device mismatch issue in acoustic scene classification.
Taking into account the structural relationships between acoustic scene classes, our proposed framework captures such relationships which are intrinsically device-independent.
In the training stage, transferable knowledge is condensed in NLE from the source domain.
In the adaptation stage, a novel RTSL strategy is adopted to learn adapted target models without using paired source-target data.
- Score: 49.0621360050418
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a domain adaptation framework to address the device
mismatch issue in acoustic scene classification leveraging upon neural label
embedding (NLE) and relational teacher student learning (RTSL). Taking into
account the structural relationships between acoustic scene classes, our
proposed framework captures such relationships which are intrinsically
device-independent. In the training stage, transferable knowledge is condensed
in NLE from the source domain. Next in the adaptation stage, a novel RTSL
strategy is adopted to learn adapted target models without using paired
source-target data often required in conventional teacher student learning. The
proposed framework is evaluated on the DCASE 2018 Task1b data set. Experimental
results based on AlexNet-L deep classification models confirm the effectiveness
of our proposed approach for mismatch situations. NLE-alone adaptation compares
favourably with the conventional device adaptation and teacher student based
adaptation techniques. NLE with RTSL further improves the classification
accuracy.
Related papers
- Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models [0.0]
This work proposes a 3 Phase technique to adjust a base model for a classification task.
We adapt the model's signal to the data distribution by performing further training with a Denoising Autoencoder (DAE)
In addition, we introduce a new data augmentation approach for Supervised Contrastive Learning to correct the unbalanced datasets.
arXiv Detail & Related papers (2024-05-23T11:08:35Z) - Latent Alignment with Deep Set EEG Decoders [44.128689862889715]
We introduce the Latent Alignment method that won the Benchmarks for EEG Transfer Learning competition.
We present its formulation as a deep set applied on the set of trials from a given subject.
Our experimental results show that performing statistical distribution alignment at later stages in a deep learning model is beneficial to the classification accuracy.
arXiv Detail & Related papers (2023-11-29T12:40:45Z) - IDA: Informed Domain Adaptive Semantic Segmentation [51.12107564372869]
We propose an Domain Informed Adaptation (IDA) model, a self-training framework that mixes the data based on class-level segmentation performance.
In our IDA model, the class-level performance is tracked by an expected confidence score (ECS) and we then use a dynamic schedule to determine the mixing ratio for data in different domains.
Our proposed method is able to outperform the state-of-the-art UDA-SS method by a margin of 1.1 mIoU in the adaptation of GTA-V to Cityscapes and of 0.9 mIoU in the adaptation of SYNTHIA to City
arXiv Detail & Related papers (2023-03-05T18:16:34Z) - Contextual Squeeze-and-Excitation for Efficient Few-Shot Image
Classification [57.36281142038042]
We present a new adaptive block called Contextual Squeeze-and-Excitation (CaSE) that adjusts a pretrained neural network on a new task to significantly improve performance.
We also present a new training protocol based on Coordinate-Descent called UpperCaSE that exploits meta-trained CaSE blocks and fine-tuning routines for efficient adaptation.
arXiv Detail & Related papers (2022-06-20T15:25:08Z) - Deep Neural Decision Forest for Acoustic Scene Classification [45.886356124352226]
Acoustic scene classification (ASC) aims to classify an audio clip based on the characteristic of the recording environment.
We propose a novel approach for ASC using deep neural decision forest (DNDF)
arXiv Detail & Related papers (2022-03-07T14:39:42Z) - Unified Instance and Knowledge Alignment Pretraining for Aspect-based
Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect.
There always exists severe domain shift between the pretraining and downstream ABSA datasets.
We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z) - Open-set Short Utterance Forensic Speaker Verification using
Teacher-Student Network with Explicit Inductive Bias [59.788358876316295]
We propose a pipeline solution to improve speaker verification on a small actual forensic field dataset.
By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning.
We show that the proposed objective function can efficiently improve the performance of teacher-student learning on short utterances.
arXiv Detail & Related papers (2020-09-21T00:58:40Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z) - Statistical Context-Dependent Units Boundary Correction for Corpus-based
Unit-Selection Text-to-Speech [1.4337588659482519]
We present an innovative technique for speaker adaptation in order to improve the accuracy of segmentation with application to unit-selection Text-To-Speech (TTS) systems.
Unlike conventional techniques for speaker adaptation, we aim to use only context dependent characteristics extrapolated with linguistic analysis techniques.
arXiv Detail & Related papers (2020-03-05T12:42:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.