Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent
  Speech Separation
        - URL: http://arxiv.org/abs/2012.00403v1
- Date: Tue, 1 Dec 2020 11:06:36 GMT
- Title: Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent
  Speech Separation
- Authors: Ziye Yang, Shanzheng Guan and Xiao-Lei Zhang
- Abstract summary: We propose deep ad-hoc beamforming based on speaker extraction, which is to our knowledge the first work for target-dependent speech separation based on ad-hoc microphone arrays and deep learning.
 Experimental results demonstrate the effectiveness of the proposed method.
- Score: 7.453268060082337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Recently, the research on ad-hoc microphone arrays with deep learning has
drawn much attention, especially in speech enhancement and separation. Because
an ad-hoc microphone array may cover such a large area that multiple speakers
may locate far apart and talk independently, target-dependent speech
separation, which aims to extract a target speaker from a mixed speech, is
important for extracting and tracing a specific speaker in the ad-hoc array.
However, this technique has not been explored yet. In this paper, we propose
deep ad-hoc beamforming based on speaker extraction, which is to our knowledge
the first work for target-dependent speech separation based on ad-hoc
microphone arrays and deep learning. The algorithm contains three components.
First, we propose a supervised channel selection framework based on speaker
extraction, where the estimated utterance-level SNRs of the target speech are
used as the basis for the channel selection. Second, we apply the selected
channels to a deep learning based MVDR algorithm, where a single-channel
speaker extraction algorithm is applied to each selected channel for estimating
the mask of the target speech. We conducted an extensive experiment on a
WSJ0-adhoc corpus. Experimental results demonstrate the effectiveness of the
proposed method.
 
      
        Related papers
        - ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in   Meetings [4.125756306660331]
 Speaker Diarization (SD) aims at grouping speech segments that belong to the same speaker.
Beamforming, i.e., spatial filtering, is a common practice to process multi-microphone audio data.
This paper proposes a self-attention-based algorithm to select the output of a bank of fixed spatial filters.
 arXiv  Detail & Related papers  (2024-06-05T13:28:28Z)
- LocSelect: Target Speaker Localization with an Auditory Selective
  Hearing Mechanism [45.90677498529653]
 We present a target speaker localization algorithm with a selective hearing mechanism.
Our proposed network LocSelect achieves a mean absolute error (MAE) of 3.55 and an accuracy (ACC) of 87.40%.
 arXiv  Detail & Related papers  (2023-10-16T15:19:05Z)
- Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS [36.023566245506046]
 We propose a human-in-the-loop speaker-adaptation method for multi-speaker text-to-speech.
The proposed method uses a sequential line search algorithm that repeatedly asks a user to select a point on a line segment in the embedding space.
 Experimental results indicate that the proposed method can achieve comparable performance to the conventional one in objective and subjective evaluations.
 arXiv  Detail & Related papers  (2022-06-21T11:08:05Z)
- Speaker Embedding-aware Neural Diarization: a Novel Framework for
  Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
 We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
 arXiv  Detail & Related papers  (2022-03-18T06:40:39Z)
- Guided Training: A Simple Method for Single-channel Speaker Separation [40.34570426165019]
 We propose a strategy to train a long short-term memory (LSTM) model to solve the permutation problem in speaker separation.
Due to the powerful capability on sequence modeling, LSTM can use its memory cells to track and separate target speech from interfering speech.
 arXiv  Detail & Related papers  (2021-03-26T08:46:50Z)
- End-to-End Speaker Diarization as Post-Processing [64.12519350944572]
 Clustering-based diarization methods partition frames into clusters of the number of speakers.
Some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification.
We propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method.
 arXiv  Detail & Related papers  (2020-12-18T05:31:07Z)
- Target-Speaker Voice Activity Detection: a Novel Approach for
  Multi-Speaker Diarization in a Dinner Party Scenario [51.50631198081903]
 We propose a novel Target-Speaker Voice Activity Detection (TS-VAD) approach.
TS-VAD directly predicts an activity of each speaker on each time frame.
 Experiments on the CHiME-6 unsegmented data show that TS-VAD achieves state-of-the-art results.
 arXiv  Detail & Related papers  (2020-05-14T21:24:56Z)
- SpEx: Multi-Scale Time Domain Speaker Extraction Network [89.00319878262005]
 Speaker extraction aims to mimic humans' selective auditory attention by extracting a target speaker's voice from a multi-talker environment.
It is common to perform the extraction in frequency-domain, and reconstruct the time-domain signal from the extracted magnitude and estimated phase spectra.
We propose a time-domain speaker extraction network (SpEx) that converts the mixture speech into multi-scale embedding coefficients instead of decomposing the speech signal into magnitude and phase spectra.
 arXiv  Detail & Related papers  (2020-04-17T16:13:06Z)
- Improving speaker discrimination of target speech extraction with
  time-domain SpeakerBeam [100.95498268200777]
 SpeakerBeam exploits an adaptation utterance of the target speaker to extract his/her voice characteristics.
SpeakerBeam sometimes fails when speakers have similar voice characteristics, such as in same-gender mixtures.
We show experimentally that these strategies greatly improve speech extraction performance, especially for same-gender mixtures.
 arXiv  Detail & Related papers  (2020-01-23T05:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.