Related papers: GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification

GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification

URL: http://arxiv.org/abs/2407.03135v1
Date: Wed, 3 Jul 2024 14:14:18 GMT
Title: GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification
Authors: Hui Yan, Zhenchun Lei, Changhong Liu, Yong Zhou,
Abstract summary: We propose the GMM-ResNext model for speaker verification. A two-path GMM-ResNext model based on two gender-related GMMs has also been proposed. The proposed GMM-ResNext achieves relative improvements of 48.1% and 11.3% in EER compared with ResNet34 and ECAPA-TDNN on VoxCeleb1-O test set.
Score: 12.598652038778368
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the development of deep learning, many different network architectures have been explored in speaker verification. However, most network architectures rely on a single deep learning architecture, and hybrid networks combining different architectures have been little studied in ASV tasks. In this paper, we propose the GMM-ResNext model for speaker verification. Conventional GMM does not consider the score distribution of each frame feature over all Gaussian components and ignores the relationship between neighboring speech frames. So, we extract the log Gaussian probability features based on the raw acoustic features and use ResNext-based network as the backbone to extract the speaker embedding. GMM-ResNext combines Generative and Discriminative Models to improve the generalization ability of deep learning models and allows one to more easily specify meaningful priors on model parameters. A two-path GMM-ResNext model based on two gender-related GMMs has also been proposed. The Experimental results show that the proposed GMM-ResNext achieves relative improvements of 48.1\% and 11.3\% in EER compared with ResNet34 and ECAPA-TDNN on VoxCeleb1-O test set.

Related papers

Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
An Efficient 1 Iteration Learning Algorithm for Gaussian Mixture Model And Gaussian Mixture Embedding For Neural Network [2.261786383673667]
The new algorithm brings more robustness and simplicity than classic Expectation Maximization (EM) algorithm. It also improves the accuracy and only take 1 iteration for learning.
arXiv Detail & Related papers (2023-08-18T10:17:59Z)
Improving Deep Attractor Network by BGRU and GMM for Speech Separation [0.0]
Deep Attractor Network (DANet) is the state-of-the-art technique in speech separation field. In this paper, a simplified and powerful DANet model is proposed using Bidirectional Gated neural network (BGRU) instead of BLSTM.
arXiv Detail & Related papers (2023-08-07T06:26:53Z)
Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent [0.0]
As of today, few known algorithms can fit or learn Gaussian mixture models. We propose a distance function called Sliced Cram'er 2-distance for learning general multivariate GMMs. These features are especially useful for distributional reinforcement learning and Deep Q Networks.
arXiv Detail & Related papers (2023-07-13T13:43:02Z)
A new perspective on probabilistic image modeling [92.89846887298852]
We present a new probabilistic approach for image modeling capable of density estimation, sampling and tractable inference. DCGMMs can be trained end-to-end by SGD from random initial conditions, much like CNNs. We show that DCGMMs compare favorably to several recent PC and SPN models in terms of inference, classification and sampling.
arXiv Detail & Related papers (2022-03-21T14:53:57Z)
Novel Hybrid DNN Approaches for Speaker Verification in Emotional and Stressful Talking Environments [1.0998375857698495]
This work combined deep models with shallow architecture, which resulted in novel hybrid classifiers. Four distinct hybrid models were utilized: deep neural network-hidden Markov model (DNN-HMM), deep neural network-Gaussian mixture model (DNN-GMM), and hidden Markov model-deep neural network (HMM-DNN) Results showed that HMM-DNN outperformed all other hybrid models in terms of equal error rate (EER) and area under the curve (AUC) evaluation metrics.
arXiv Detail & Related papers (2021-12-26T10:47:14Z)
Multi-turn RNN-T for streaming recognition of multi-party speech [2.899379040028688]
This work takes real-time applicability as the first priority in model design and addresses a few challenges in previous work on multi-speaker recurrent neural network transducer (MS-RNN-T) We introduce on-the-fly overlapping speech simulation during training, yielding 14% relative word error rate (WER) improvement on LibriSpeechMix test set. We propose a novel multi-turn RNN-T (MT-RNN-T) model with an overlap-based target arrangement strategy that generalizes to an arbitrary number of speakers without changes in the model architecture.
arXiv Detail & Related papers (2021-12-19T17:22:58Z)
Meta-Aggregator: Learning to Aggregate for 1-bit Graph Neural Networks [127.32203532517953]
We develop a vanilla 1-bit framework that binarizes both the GNN parameters and the graph features. Despite the lightweight architecture, we observed that this vanilla framework suffered from insufficient discriminative power in distinguishing graph topologies. This discovery motivates us to devise meta aggregators to improve the expressive power of vanilla binarized GNNs.
arXiv Detail & Related papers (2021-09-27T08:50:37Z)
Image Modeling with Deep Convolutional Gaussian Mixture Models [79.0660895390689]
We present a new formulation of deep hierarchical Gaussian Mixture Models (GMMs) that is suitable for describing and generating images. DCGMMs avoid this by a stacked architecture of multiple GMM layers, linked by convolution and pooling operations. For generating sharp images with DCGMMs, we introduce a new gradient-based technique for sampling through non-invertible operations like convolution and pooling. Based on the MNIST and FashionMNIST datasets, we validate the DCGMMs model by demonstrating its superiority over flat GMMs for clustering, sampling and outlier detection.
arXiv Detail & Related papers (2021-04-19T12:08:53Z)
DeepGMR: Learning Latent Gaussian Mixture Models for Registration [113.74060941036664]
Point cloud registration is a fundamental problem in 3D computer vision, graphics and robotics. In this paper, we introduce Deep Gaussian Mixture Registration (DeepGMR), the first learning-based registration method. Our proposed method shows favorable performance when compared with state-of-the-art geometry-based and learning-based registration methods.
arXiv Detail & Related papers (2020-08-20T17:25:16Z)
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation. We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.