Single channel voice separation for unknown number of speakers under
reverberant and noisy settings
- URL: http://arxiv.org/abs/2011.02329v1
- Date: Wed, 4 Nov 2020 14:59:14 GMT
- Title: Single channel voice separation for unknown number of speakers under
reverberant and noisy settings
- Authors: Shlomo E. Chazan, Lior Wolf, Eliya Nachmani, Yossi Adi
- Abstract summary: We present a unified network for voice separation of an unknown number of speakers.
The proposed approach is composed of several separation heads optimized together with a speaker classification branch.
We present a new noisy and reverberant dataset of up to five different speakers speaking simultaneously.
- Score: 106.48335929548875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a unified network for voice separation of an unknown number of
speakers. The proposed approach is composed of several separation heads
optimized together with a speaker classification branch. The separation is
carried out in the time domain, together with parameter sharing between all
separation heads. The classification branch estimates the number of speakers
while each head is specialized in separating a different number of speakers. We
evaluate the proposed model under both clean and noisy reverberant set-tings.
Results suggest that the proposed approach is superior to the baseline model by
a significant margin. Additionally, we present a new noisy and reverberant
dataset of up to five different speakers speaking simultaneously.
Related papers
- SepIt: Approaching a Single Channel Speech Separation Bound [99.19786288094596]
We introduce a Deep neural network, SepIt, that iteratively improves the different speakers' estimation.
In an extensive set of experiments, SepIt outperforms the state-of-the-art neural networks for 2, 3, 5, and 10 speakers.
arXiv Detail & Related papers (2022-05-24T05:40:36Z) - Bi-LSTM Scoring Based Similarity Measurement with Agglomerative
Hierarchical Clustering (AHC) for Speaker Diarization [0.0]
A typical conversation between two speakers consists of segments where their voices overlap, interrupt each other or halt their speech in between multiple sentences.
Recent advancements in diarization technology leverage neural network-based approaches to improvise speaker diarization system.
We propose a Bi-directional Long Short-term Memory network for estimating the elements present in the similarity matrix.
arXiv Detail & Related papers (2022-05-19T17:20:51Z) - Coarse-to-Fine Recursive Speech Separation for Unknown Number of
Speakers [8.380514397417457]
This paper formulates the speech separation with the unknown number of speakers as a multi-pass source extraction problem.
Experiments show that the proposed method archived state-of-the-art performance on the WSJ0 dataset with a different number of speakers.
arXiv Detail & Related papers (2022-03-30T04:45:34Z) - End-to-End Diarization for Variable Number of Speakers with Local-Global
Networks and Discriminative Speaker Embeddings [66.50782702086575]
We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings.
The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions.
arXiv Detail & Related papers (2021-05-05T14:55:29Z) - Time-Domain Speech Extraction with Spatial Information and Multi Speaker
Conditioning Mechanism [27.19635746008699]
We present a novel multi-channel speech extraction system to simultaneously extract multiple clean individual sources from a mixture.
The proposed method is built on an improved multi-channel time-domain speech separation network.
Experiments on 2-channel WHAMR! data show that the proposed system improves by 9% relative the source separation performance over a strong multi-channel baseline.
arXiv Detail & Related papers (2021-02-07T10:11:49Z) - End-to-End Speaker Diarization as Post-Processing [64.12519350944572]
Clustering-based diarization methods partition frames into clusters of the number of speakers.
Some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification.
We propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method.
arXiv Detail & Related papers (2020-12-18T05:31:07Z) - Multi-Decoder DPRNN: High Accuracy Source Counting and Separation [39.36689677776645]
We propose an end-to-end trainable approach to single-channel speech separation with unknown number of speakers.
Our approach extends the MulCat source separation backbone with additional output heads: a count-head to infer the number of speakers, and decoder-heads for reconstructing the original signals.
We demonstrate that our approach outperforms state-of-the-art in counting the number of speakers and remains competitive in quality of reconstructed signals.
arXiv Detail & Related papers (2020-11-24T11:00:21Z) - Speaker Separation Using Speaker Inventories and Estimated Speech [78.57067876891253]
We propose speaker separation using speaker inventories (SSUSI) and speaker separation using estimated speech (SSUES)
By combining the advantages of permutation invariant training (PIT) and speech extraction, SSUSI significantly outperforms conventional approaches.
arXiv Detail & Related papers (2020-10-20T18:15:45Z) - Voice Separation with an Unknown Number of Multiple Speakers [113.91855071999298]
We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously.
The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed.
arXiv Detail & Related papers (2020-02-29T20:02:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.