Speaker Recognition in the Wild
- URL: http://arxiv.org/abs/2205.02475v1
- Date: Thu, 5 May 2022 07:17:17 GMT
- Title: Speaker Recognition in the Wild
- Authors: Neeraj Chhimwal, Anirudh Gupta, Rishabh Gaur, Harveen Singh Chadha,
Priyanshi Shah, Ankur Dhuriya, Vivek Raghavan
- Abstract summary: We propose a pipeline to find the number of speakers, as well as audios belonging to each of these now identified speakers.
We use this approach as a part of our Data Preparation pipeline for Speech Recognition in Indic languages.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a pipeline to find the number of speakers, as well
as audios belonging to each of these now identified speakers in a source of
audio data where number of speakers or speaker labels are not known a priori.
We used this approach as a part of our Data Preparation pipeline for Speech
Recognition in Indic Languages
(https://github.com/Open-Speech-EkStep/vakyansh-wav2vec2-experimentation). To
understand and evaluate the accuracy of our proposed pipeline, we introduce two
metrics: Cluster Purity, and Cluster Uniqueness. Cluster Purity quantifies how
"pure" a cluster is. Cluster Uniqueness, on the other hand, quantifies what
percentage of clusters belong only to a single dominant speaker. We discuss
more on these metrics in section \ref{sec:metrics}. Since we develop this
utility to aid us in identifying data based on speaker IDs before training an
Automatic Speech Recognition (ASR) model, and since most of this data takes
considerable effort to scrape, we also conclude that 98\% of data gets mapped
to the top 80\% of clusters (computed by removing any clusters with less than a
fixed number of utterances -- we do this to get rid of some very small clusters
and use this threshold as 30), in the test set chosen.
Related papers
- Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens [45.161909551392085]
We propose a novel attention-based encoder-decoder method augmented with speaker class tokens obtained by speaker clustering.
During inference, we select multiple recognition hypotheses conditioned on predicted speaker cluster tokens.
These hypotheses are merged by agglomerative hierarchical clustering based on the normalized edit distance.
arXiv Detail & Related papers (2024-09-24T04:31:46Z) - SLICER: Learning universal audio representations using low-resource
self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data.
Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z) - Bi-LSTM Scoring Based Similarity Measurement with Agglomerative
Hierarchical Clustering (AHC) for Speaker Diarization [0.0]
A typical conversation between two speakers consists of segments where their voices overlap, interrupt each other or halt their speech in between multiple sentences.
Recent advancements in diarization technology leverage neural network-based approaches to improvise speaker diarization system.
We propose a Bi-directional Long Short-term Memory network for estimating the elements present in the similarity matrix.
arXiv Detail & Related papers (2022-05-19T17:20:51Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring [60.55025339250815]
We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling.
We take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context from these responses and feed them as additional speaker-specific context to our network to score a particular response.
arXiv Detail & Related papers (2021-08-30T07:00:28Z) - HuBERT: Self-Supervised Speech Representation Learning by Masked
Prediction of Hidden Units [81.53783563025084]
We propose an offline clustering step to provide aligned target labels for a BERT-like prediction loss.
A key ingredient of our approach is applying the prediction loss over the masked regions only.
HuBERT shows up to 19% and 13% relative WER reduction on the more challenging dev-other and test-other evaluation subsets.
arXiv Detail & Related papers (2021-06-14T14:14:28Z) - U-vectors: Generating clusterable speaker embedding from unlabeled data [0.0]
This paper introduces a speaker recognition strategy dealing with unlabeled data.
It generates clusterable embedding vectors from small fixed-size speech frames.
We conclude that the proposed approach achieves remarkable performance using pairwise architectures.
arXiv Detail & Related papers (2021-02-07T18:00:09Z) - End-to-End Speaker Diarization as Post-Processing [64.12519350944572]
Clustering-based diarization methods partition frames into clusters of the number of speakers.
Some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification.
We propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method.
arXiv Detail & Related papers (2020-12-18T05:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.