Class Token and Knowledge Distillation for Multi-head Self-Attention
Speaker Verification Systems
- URL: http://arxiv.org/abs/2111.03842v1
- Date: Sat, 6 Nov 2021 09:47:05 GMT
- Title: Class Token and Knowledge Distillation for Multi-head Self-Attention
Speaker Verification Systems
- Authors: Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida
- Abstract summary: This paper explores three novel approaches to improve the performance of speaker verification systems based on deep neural networks (DNN)
Firstly, we propose the use of a learnable vector called Class token to replace the average global pooling mechanism to extract the embeddings.
Second, we have added a distilled representation token for training a teacher-student pair of networks using the Knowledge Distillation (KD) philosophy.
- Score: 20.55054374525828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores three novel approaches to improve the performance of
speaker verification (SV) systems based on deep neural networks (DNN) using
Multi-head Self-Attention (MSA) mechanisms and memory layers. Firstly, we
propose the use of a learnable vector called Class token to replace the average
global pooling mechanism to extract the embeddings. Unlike global average
pooling, our proposal takes into account the temporal structure of the input
what is relevant for the text-dependent SV task. The class token is
concatenated to the input before the first MSA layer, and its state at the
output is used to predict the classes. To gain additional robustness, we
introduce two approaches. First, we have developed a Bayesian estimation of the
class token. Second, we have added a distilled representation token for
training a teacher-student pair of networks using the Knowledge Distillation
(KD) philosophy, which is combined with the class token. This distillation
token is trained to mimic the predictions from the teacher network, while the
class token replicates the true label. All the strategies have been tested on
the RSR2015-Part II and DeepMine-Part 1 databases for text-dependent SV,
providing competitive results compared to the same architecture using the
average pooling mechanism to extract average embeddings.
Related papers
- Incubating Text Classifiers Following User Instruction with Nothing but LLM [37.92922713921964]
We propose a framework to generate text classification data given arbitrary class definitions (i.e., user instruction)
Our proposed Incubator is the first framework that can handle complicated and even mutually dependent classes.
arXiv Detail & Related papers (2024-04-16T19:53:35Z) - Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - PromptKD: Unsupervised Prompt Distillation for Vision-Language Models [40.858721356497085]
We introduce an unsupervised domain prompt distillation framework, which aims to transfer the knowledge of a larger teacher model to a lightweight target model.
Our framework consists of two distinct stages. In the initial stage, we pre-train a large CLIP teacher model using domain (few-shot) labels.
In the subsequent stage, the stored class vectors are shared across teacher and student image encoders for calculating the predicted logits.
arXiv Detail & Related papers (2024-03-05T08:53:30Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Training ELECTRA Augmented with Multi-word Selection [53.77046731238381]
We present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.
Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets.
arXiv Detail & Related papers (2021-05-31T23:19:00Z) - UIUC_BioNLP at SemEval-2021 Task 11: A Cascade of Neural Models for
Structuring Scholarly NLP Contributions [1.5942130010323128]
We propose a cascade of neural models that performs sentence classification, phrase recognition, and triple extraction.
A BERT-CRF model was used to recognize and characterize relevant phrases in contribution sentences.
Our system was officially ranked second in Phase 1 evaluation and first in both parts of Phase 2 evaluation.
arXiv Detail & Related papers (2021-05-12T05:24:35Z) - An evidential classifier based on Dempster-Shafer theory and deep
learning [6.230751621285322]
We propose a new classification system based on Dempster-Shafer (DS) theory and a convolutional neural network (CNN) architecture for set-valued classification.
Experiments on image recognition, signal processing, and semantic-relationship classification tasks demonstrate that the proposed combination of deep CNN, DS layer, and expected utility layer makes it possible to improve classification accuracy.
arXiv Detail & Related papers (2021-03-25T01:29:05Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - Binary Classification from Multiple Unlabeled Datasets via Surrogate Set
Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$.
Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - Digit Image Recognition Using an Ensemble of One-Versus-All Deep Network
Classifiers [2.385916960125935]
We implement a novel technique for the case of digit image recognition and test and evaluate it on the same.
Every network in an ensemble has been trained by an OVA training technique using the Gradient Descent with Momentum (SGDMA)
Our proposed technique outperforms the baseline on digit image recognition for all datasets.
arXiv Detail & Related papers (2020-06-28T15:37:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.