Ordered and Binary Speaker Embedding
- URL: http://arxiv.org/abs/2305.16043v1
- Date: Thu, 25 May 2023 13:21:00 GMT
- Title: Ordered and Binary Speaker Embedding
- Authors: Jiaying Wang and Xianglong Wang and Namin Wang and Lantian Li and Dong
Wang
- Abstract summary: We propose an ordered binary embedding approach that sorts the dimensions of the embedding vector via a nested dropout and converts the sorted vectors to binary codes via Bernoulli sampling.
The resultant ordered binary codes offer some important merits such as hierarchical clustering, reduced memory usage, and fast retrieval.
- Score: 12.22202088781098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern speaker recognition systems represent utterances by embedding vectors.
Conventional embedding vectors are dense and non-structural. In this paper, we
propose an ordered binary embedding approach that sorts the dimensions of the
embedding vector via a nested dropout and converts the sorted vectors to binary
codes via Bernoulli sampling. The resultant ordered binary codes offer some
important merits such as hierarchical clustering, reduced memory usage, and
fast retrieval. These merits were empirically verified by comprehensive
experiments on a speaker identification task with the VoxCeleb and CN-Celeb
datasets.
Related papers
- Sequence Shortening for Context-Aware Machine Translation [5.803309695504831]
We show that a special case of multi-encoder architecture achieves higher accuracy on contrastive datasets.
We introduce two novel methods - Latent Grouping and Latent Selecting, where the network learns to group tokens or selects the tokens to be cached as context.
arXiv Detail & Related papers (2024-02-02T13:55:37Z) - Emergence of Latent Binary Encoding in Deep Neural Network Classifiers [0.0]
We investigate the emergence of binary encoding within the latent space of deep-neural-network classifiers.
By analyzing several datasets of increasing complexity, we provide empirical evidence that the emergence of binary encoding dramatically enhances robustness.
arXiv Detail & Related papers (2023-10-12T11:16:57Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Hierarchical Sketch Induction for Paraphrase Generation [79.87892048285819]
We introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE), a method for learning decompositions of dense encodings.
We use HRQ-VAE to encode the syntactic form of an input sentence as a path through the hierarchy, allowing us to more easily predict syntactic sketches at test time.
arXiv Detail & Related papers (2022-03-07T15:28:36Z) - Nearest neighbor search with compact codes: A decoder perspective [77.60612610421101]
We re-interpret popular methods such as binary hashing or product quantizers as auto-encoders.
We design backward-compatible decoders that improve the reconstruction of the vectors from the same codes.
arXiv Detail & Related papers (2021-12-17T15:22:28Z) - Sparse Coding with Multi-Layer Decoders using Variance Regularization [19.8572592390623]
We propose a novel sparse coding protocol which prevents a collapse in the codes without the need to regularize the decoder.
Our method regularizes the codes directly so that each latent code component has variance greater than a fixed threshold.
We show that sparse autoencoders with multi-layer decoders trained using our variance regularization method produce higher quality reconstructions with sparser representations.
arXiv Detail & Related papers (2021-12-16T21:46:23Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - byteSteady: Fast Classification Using Byte-Level n-Gram Embeddings [77.6701264226519]
We introduce byteSteady, a fast model for classification using byte-level n-gram embeddings.
A straightforward application of byteSteady is text classification.
We also apply byteSteady to one type of non-language data -- DNA sequences for gene classification.
arXiv Detail & Related papers (2021-06-24T20:14:48Z) - Acoustic Neighbor Embeddings [2.842794675894731]
This paper proposes a novel acoustic word embedding called Acoustic Neighbor Embeddings.
The Euclidean distance between coordinates in the embedding space reflects the phonetic confusability between their corresponding sequences.
The recognition accuracy is identical to that of conventional finite state transducer(FST)-based decoding using test data with up to 1 million names in the vocabulary and 40 dimensions in the embeddings.
arXiv Detail & Related papers (2020-07-20T05:33:07Z) - Unsupervised Speaker Adaptation using Attention-based Speaker Memory for
End-to-End ASR [61.55606131634891]
We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR)
The proposed model contains a memory block that holds speaker i-vectors extracted from the training data and reads relevant i-vectors from the memory through an attention mechanism.
We show that M-vectors, which do not require an auxiliary speaker embedding extraction system at test time, achieve similar word error rates (WERs) compared to i-vectors for single speaker utterances and significantly lower WERs for utterances in which there are speaker changes
arXiv Detail & Related papers (2020-02-14T18:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.