L-Vector: Neural Label Embedding for Domain Adaptation
- URL: http://arxiv.org/abs/2004.13480v1
- Date: Sat, 25 Apr 2020 06:40:31 GMT
- Title: L-Vector: Neural Label Embedding for Domain Adaptation
- Authors: Zhong Meng, Hu Hu, Jinyu Li, Changliang Liu, Yan Huang, Yifan Gong,
Chin-Hui Lee
- Abstract summary: We propose a novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples.
NLE achieves up to 14.1% relative word error rate reduction over direct re-training with one-hot labels.
- Score: 62.112885747045766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel neural label embedding (NLE) scheme for the domain
adaptation of a deep neural network (DNN) acoustic model with unpaired data
samples from source and target domains. With NLE method, we distill the
knowledge from a powerful source-domain DNN into a dictionary of label
embeddings, or l-vectors, one for each senone class. Each l-vector is a
representation of the senone-specific output distributions of the source-domain
DNN and is learned to minimize the average L2, Kullback-Leibler (KL) or
symmetric KL distance to the output vectors with the same label through simple
averaging or standard back-propagation. During adaptation, the l-vectors serve
as the soft targets to train the target-domain model with cross-entropy loss.
Without parallel data constraint as in the teacher-student learning, NLE is
specially suited for the situation where the paired target-domain data cannot
be simulated from the source-domain data. We adapt a 6400 hours
multi-conditional US English acoustic model to each of the 9 accented English
(80 to 830 hours) and kids' speech (80 hours). NLE achieves up to 14.1%
relative word error rate reduction over direct re-training with one-hot labels.
Related papers
- Layer-wise Regularized Dropout for Neural Language Models [57.422407462430186]
Layer-wise Regularized Dropout (LR-Drop) is specially designed for Transformer-based Language models.
We show that LR-Drop achieves superior performances, including state-of-the-art results.
arXiv Detail & Related papers (2024-02-26T07:31:35Z) - Detecting Novelties with Empty Classes [6.953730499849023]
We build upon anomaly detection to retrieve out-of-distribution (OoD) data as candidates for new classes.
We introduce two loss functions, which 1) entice the DNN to assign OoD samples to the empty classes and 2) to minimize the inner-class feature distances between them.
arXiv Detail & Related papers (2023-04-30T19:52:47Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Robust Target Training for Multi-Source Domain Adaptation [110.77704026569499]
We propose a novel Bi-level Optimization based Robust Target Training (BORT$2$) method for MSDA.
Our proposed method achieves the state of the art performance on three MSDA benchmarks, including the large-scale DomainNet dataset.
arXiv Detail & Related papers (2022-10-04T15:20:01Z) - RNN Transducers for Nested Named Entity Recognition with constraints on
alignment for long sequences [4.545971444299925]
We introduce a new model for NER tasks -- an transducer (RNN-T)
RNN-T models learn the alignment using a loss function that sums over all alignments.
In NER tasks, however, the alignment between words and target labels are available from annotations.
We demonstrate that our fixed alignment model outperforms the standard RNN-score model.
arXiv Detail & Related papers (2022-02-08T05:38:20Z) - Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRF [30.982301053976023]
Data sparsity problem is a key challenge of Natural Language Understanding (NLU)
We propose to improve prototypical networks with vector projection distance and triangular Conditional Random Field (CRF) for the few-shot NLU.
Our approach can achieve a new state-of-the-art on two few-shot NLU benchmarks (Few-Joint and SNIPS) in Chinese and English without fine-tuning on target domains.
arXiv Detail & Related papers (2021-12-09T15:46:15Z) - Adaptive Nearest Neighbor Machine Translation [60.97183408140499]
kNN-MT combines pre-trained neural machine translation with token-level k-nearest-neighbor retrieval.
Traditional kNN algorithm simply retrieves a same number of nearest neighbors for each target token.
We propose Adaptive kNN-MT to dynamically determine the number of k for each target token.
arXiv Detail & Related papers (2021-05-27T09:27:42Z) - Evaluating Deep Neural Network Ensembles by Majority Voting cum
Meta-Learning scheme [3.351714665243138]
We propose an ensemble of seven independent Deep Neural Networks (DNNs) for a new data instance.
One-seventh of the data is deleted and replenished by bootstrap sampling from the remaining samples.
All the algorithms in this paper have been tested on five benchmark datasets.
arXiv Detail & Related papers (2021-05-09T03:10:56Z) - OVANet: One-vs-All Network for Universal Domain Adaptation [78.86047802107025]
Existing methods manually set a threshold to reject unknown samples based on validation or a pre-defined ratio of unknown samples.
We propose a method to learn the threshold using source samples and to adapt it to the target domain.
Our idea is that a minimum inter-class distance in the source domain should be a good threshold to decide between known or unknown in the target.
arXiv Detail & Related papers (2021-04-07T18:36:31Z) - Domain Adaptation Using Class Similarity for Robust Speech Recognition [24.951852740214413]
This paper proposes a novel adaptation method for deep neural network (DNN) acoustic model using class similarity.
Experiments showed that our approach outperforms fine-tuning using one-hot labels on both accent and noise adaptation task.
arXiv Detail & Related papers (2020-11-05T12:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.