Improving speaker discrimination of target speech extraction with
time-domain SpeakerBeam
- URL: http://arxiv.org/abs/2001.08378v1
- Date: Thu, 23 Jan 2020 05:36:06 GMT
- Title: Improving speaker discrimination of target speech extraction with
time-domain SpeakerBeam
- Authors: Marc Delcroix, Tsubasa Ochiai, Katerina Zmolikova, Keisuke Kinoshita,
Naohiro Tawara, Tomohiro Nakatani, Shoko Araki
- Abstract summary: SpeakerBeam exploits an adaptation utterance of the target speaker to extract his/her voice characteristics.
SpeakerBeam sometimes fails when speakers have similar voice characteristics, such as in same-gender mixtures.
We show experimentally that these strategies greatly improve speech extraction performance, especially for same-gender mixtures.
- Score: 100.95498268200777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Target speech extraction, which extracts a single target source in a mixture
given clues about the target speaker, has attracted increasing attention. We
have recently proposed SpeakerBeam, which exploits an adaptation utterance of
the target speaker to extract his/her voice characteristics that are then used
to guide a neural network towards extracting speech of that speaker.
SpeakerBeam presents a practical alternative to speech separation as it enables
tracking speech of a target speaker across utterances, and achieves promising
speech extraction performance. However, it sometimes fails when speakers have
similar voice characteristics, such as in same-gender mixtures, because it is
difficult to discriminate the target speaker from the interfering speakers. In
this paper, we investigate strategies for improving the speaker discrimination
capability of SpeakerBeam. First, we propose a time-domain implementation of
SpeakerBeam similar to that proposed for a time-domain audio separation network
(TasNet), which has achieved state-of-the-art performance for speech
separation. Besides, we investigate (1) the use of spatial features to better
discriminate speakers when microphone array recordings are available, (2)
adding an auxiliary speaker identification loss for helping to learn more
discriminative voice characteristics. We show experimentally that these
strategies greatly improve speech extraction performance, especially for
same-gender mixtures, and outperform TasNet in terms of target speech
extraction.
Related papers
- In search of strong embedding extractors for speaker diarisation [49.7017388682077]
We tackle two key problems when adopting EEs for speaker diarisation.
First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation.
We show that better performance on widely adopted speaker verification evaluation protocols does not lead to better diarisation performance.
We propose two data augmentation techniques to alleviate the second problem, making embedding extractors aware of overlapped speech or speaker change input.
arXiv Detail & Related papers (2022-10-26T13:00:29Z) - Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS [36.023566245506046]
We propose a human-in-the-loop speaker-adaptation method for multi-speaker text-to-speech.
The proposed method uses a sequential line search algorithm that repeatedly asks a user to select a point on a line segment in the embedding space.
Experimental results indicate that the proposed method can achieve comparable performance to the conventional one in objective and subjective evaluations.
arXiv Detail & Related papers (2022-06-21T11:08:05Z) - Speaker Extraction with Co-Speech Gestures Cue [79.91394239104908]
We explore the use of co-speech gestures sequence, e.g. hand and body movements, as the speaker cue for speaker extraction.
We propose two networks using the co-speech gestures cue to perform attentive listening on the target speaker.
The experimental results show that the co-speech gestures cue is informative in associating the target speaker, and the quality of the extracted speech shows significant improvements over the unprocessed mixture speech.
arXiv Detail & Related papers (2022-03-31T06:48:52Z) - Speaker Identity Preservation in Dysarthric Speech Reconstruction by
Adversarial Speaker Adaptation [59.41186714127256]
Dysarthric speech reconstruction (DSR) aims to improve the quality of dysarthric speech.
Speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity.
We propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA)
arXiv Detail & Related papers (2022-02-18T08:59:36Z) - Guided Training: A Simple Method for Single-channel Speaker Separation [40.34570426165019]
We propose a strategy to train a long short-term memory (LSTM) model to solve the permutation problem in speaker separation.
Due to the powerful capability on sequence modeling, LSTM can use its memory cells to track and separate target speech from interfering speech.
arXiv Detail & Related papers (2021-03-26T08:46:50Z) - U-vectors: Generating clusterable speaker embedding from unlabeled data [0.0]
This paper introduces a speaker recognition strategy dealing with unlabeled data.
It generates clusterable embedding vectors from small fixed-size speech frames.
We conclude that the proposed approach achieves remarkable performance using pairwise architectures.
arXiv Detail & Related papers (2021-02-07T18:00:09Z) - Speaker Separation Using Speaker Inventories and Estimated Speech [78.57067876891253]
We propose speaker separation using speaker inventories (SSUSI) and speaker separation using estimated speech (SSUES)
By combining the advantages of permutation invariant training (PIT) and speech extraction, SSUSI significantly outperforms conventional approaches.
arXiv Detail & Related papers (2020-10-20T18:15:45Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.