Deep Speaker Vector Normalization with Maximum Gaussianality Training
- URL: http://arxiv.org/abs/2010.16148v1
- Date: Fri, 30 Oct 2020 09:42:06 GMT
- Title: Deep Speaker Vector Normalization with Maximum Gaussianality Training
- Authors: Yunqi Cai, Lantian Li, Dong Wang and Andrew Abel
- Abstract summary: A key problem with deep speaker embedding is that the resulting deep speaker vectors tend to be irregularly distributed.
In previous research, we proposed a deep normalization approach based on a new discriminative normalization flow (DNF) model.
Despite this remarkable success, we empirically found that the latent codes produced by the DNF model are generally neither homogeneous nor Gaussian.
We propose a new Maximum Gaussianality (MG) training approach that directly maximizes the Gaussianality of the latent codes.
- Score: 13.310988353839237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep speaker embedding represents the state-of-the-art technique for speaker
recognition. A key problem with this approach is that the resulting deep
speaker vectors tend to be irregularly distributed. In previous research, we
proposed a deep normalization approach based on a new discriminative
normalization flow (DNF) model, by which the distributions of individual
speakers are arguably transformed to homogeneous Gaussians. This normalization
was demonstrated to be effective, but despite this remarkable success, we
empirically found that the latent codes produced by the DNF model are generally
neither homogeneous nor Gaussian, although the model has assumed so. In this
paper, we argue that this problem is largely attributed to the
maximum-likelihood (ML) training criterion of the DNF model, which aims to
maximize the likelihood of the observations but not necessarily improve the
Gaussianality of the latent codes. We therefore propose a new Maximum
Gaussianality (MG) training approach that directly maximizes the Gaussianality
of the latent codes. Our experiments on two data sets, SITW and CNCeleb,
demonstrate that our new MG training approach can deliver much better
performance than the previous ML training, and exhibits improved domain
generalizability, particularly with regard to cosine scoring.
Related papers
- Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection [12.065053799927506]
We propose a novel Hierarchical Gaussian mixture normalizing flow modeling method for accomplishing unified Anomaly Detection.
Our HGAD consists of two key components: inter-class Gaussian mixture modeling and intra-class mixed class centers learning.
We evaluate our method on four real-world AD benchmarks, where we can significantly improve the previous NF-based AD methods and also outperform the SOTA unified AD methods.
arXiv Detail & Related papers (2024-03-20T07:21:37Z) - Learning a Gaussian Mixture for Sparsity Regularization in Inverse
Problems [2.375943263571389]
In inverse problems, the incorporation of a sparsity prior yields a regularization effect on the solution.
We propose a probabilistic sparsity prior formulated as a mixture of Gaussians, capable of modeling sparsity with respect to a generic basis.
We put forth both a supervised and an unsupervised training strategy to estimate the parameters of this network.
arXiv Detail & Related papers (2024-01-29T22:52:57Z) - Heavy-tailed denoising score matching [5.371337604556311]
We develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in Langevin dynamics.
On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.
arXiv Detail & Related papers (2021-12-17T22:04:55Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections [73.95786440318369]
We focus on the so-called implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of gradient descent (SGD)
We show that this effect induces an asymmetric heavy-tailed noise on gradient updates.
We then formally prove that GNIs induce an implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry.
arXiv Detail & Related papers (2021-02-13T21:28:09Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Plug-And-Play Learned Gaussian-mixture Approximate Message Passing [71.74028918819046]
We propose a plug-and-play compressed sensing (CS) recovery algorithm suitable for any i.i.d. source prior.
Our algorithm builds upon Borgerding's learned AMP (LAMP), yet significantly improves it by adopting a universal denoising function within the algorithm.
Numerical evaluation shows that the L-GM-AMP algorithm achieves state-of-the-art performance without any knowledge of the source prior.
arXiv Detail & Related papers (2020-11-18T16:40:45Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Deep Normalization for Speaker Vectors [13.310988353839237]
Deep speaker embedding has demonstrated state-of-the-art performance in speaker recognition tasks.
Deep speaker vectors tend to be non-Gaussian for each individual speaker, and non-homogeneous for distributions of different speakers.
We propose a deep normalization approach based on a novel discriminative normalization flow (DNF) model.
arXiv Detail & Related papers (2020-04-07T09:20:48Z) - Gaussianization Flows [113.79542218282282]
We propose a new type of normalizing flow model that enables both efficient iteration of likelihoods and efficient inversion for sample generation.
Because of this guaranteed expressivity, they can capture multimodal target distributions without compromising the efficiency of sample generation.
arXiv Detail & Related papers (2020-03-04T08:15:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.