Improving Deep Attractor Network by BGRU and GMM for Speech Separation
- URL: http://arxiv.org/abs/2308.03332v1
- Date: Mon, 7 Aug 2023 06:26:53 GMT
- Title: Improving Deep Attractor Network by BGRU and GMM for Speech Separation
- Authors: Rawad Melhem, Assef Jafar, Riad Hamadeh
- Abstract summary: Deep Attractor Network (DANet) is the state-of-the-art technique in speech separation field.
In this paper, a simplified and powerful DANet model is proposed using Bidirectional Gated neural network (BGRU) instead of BLSTM.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Attractor Network (DANet) is the state-of-the-art technique in speech
separation field, which uses Bidirectional Long Short-Term Memory (BLSTM), but
the complexity of the DANet model is very high. In this paper, a simplified and
powerful DANet model is proposed using Bidirectional Gated neural network
(BGRU) instead of BLSTM. The Gaussian Mixture Model (GMM) other than the
k-means was applied in DANet as a clustering algorithm to reduce the complexity
and increase the learning speed and accuracy. The metrics used in this paper
are Signal to Distortion Ratio (SDR), Signal to Interference Ratio (SIR),
Signal to Artifact Ratio (SAR), and Perceptual Evaluation Speech Quality (PESQ)
score. Two speaker mixture datasets from TIMIT corpus were prepared to evaluate
the proposed model, and the system achieved 12.3 dB and 2.94 for SDR and PESQ
scores respectively, which were better than the original DANet model. Other
improvements were 20.7% and 17.9% in the number of parameters and time
training, respectively. The model was applied on mixed Arabic speech signals
and the results were better than that in English.
Related papers
- Learning Load Balancing with GNN in MPTCP-Enabled Heterogeneous Networks [13.178956651532213]
We propose a graph neural network (GNN)-based model to tackle the LB problem for MP TCP-enabled HetNets.
Compared to the conventional deep neural network (DNN), the proposed GNN-based model exhibits two key strengths.
arXiv Detail & Related papers (2024-10-22T15:49:53Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Prediction of speech intelligibility with DNN-based performance measures [9.883633991083789]
This paper presents a speech intelligibility model based on automatic speech recognition (ASR)
It combines phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities.
The proposed model performs almost as well as the label-based model and produces more accurate predictions than the baseline models.
arXiv Detail & Related papers (2022-03-17T08:05:38Z) - Parameter estimation for WMTI-Watson model of white matter using
encoder-decoder recurrent neural network [0.0]
In this study, we evaluate the performance of NLLS, the RNN-based method and a multilayer perceptron (MLP) on datasets rat and human brain.
We showed that the proposed RNN-based fitting approach had the advantage of highly reduced computation time over NLLS.
arXiv Detail & Related papers (2022-03-01T16:33:15Z) - Novel Hybrid DNN Approaches for Speaker Verification in Emotional and
Stressful Talking Environments [1.0998375857698495]
This work combined deep models with shallow architecture, which resulted in novel hybrid classifiers.
Four distinct hybrid models were utilized: deep neural network-hidden Markov model (DNN-HMM), deep neural network-Gaussian mixture model (DNN-GMM), and hidden Markov model-deep neural network (HMM-DNN)
Results showed that HMM-DNN outperformed all other hybrid models in terms of equal error rate (EER) and area under the curve (AUC) evaluation metrics.
arXiv Detail & Related papers (2021-12-26T10:47:14Z) - SignalNet: A Low Resolution Sinusoid Decomposition and Estimation
Network [79.04274563889548]
We propose SignalNet, a neural network architecture that detects the number of sinusoids and estimates their parameters from quantized in-phase and quadrature samples.
We introduce a worst-case learning threshold for comparing the results of our network relative to the underlying data distributions.
In simulation, we find that our algorithm is always able to surpass the threshold for three-bit data but often cannot exceed the threshold for one-bit data.
arXiv Detail & Related papers (2021-06-10T04:21:20Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix.
We train a CNN in the low-SNR regime to predict DoAs across all SNRs.
Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.