Optimized Power Normalized Cepstral Coefficients towards Robust Deep
Speaker Verification
- URL: http://arxiv.org/abs/2109.12058v1
- Date: Fri, 24 Sep 2021 16:26:12 GMT
- Title: Optimized Power Normalized Cepstral Coefficients towards Robust Deep
Speaker Verification
- Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen
- Abstract summary: We revisit and optimize PNCCs by ablating its medium-time processor and by introducing channel energy normalization.
Experimental results with a DNN-based speaker verification system indicate substantial improvement over baseline PNCCs.
- Score: 21.237143465298505
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: After their introduction to robust speech recognition, power normalized
cepstral coefficient (PNCC) features were successfully adopted to other tasks,
including speaker verification. However, as a feature extractor with long-term
operations on the power spectrogram, its temporal processing and amplitude
scaling steps dedicated on environmental compensation may be redundant.
Further, they might suppress intrinsic speaker variations that are useful for
speaker verification based on deep neural networks (DNN). Therefore, in this
study, we revisit and optimize PNCCs by ablating its medium-time processor and
by introducing channel energy normalization. Experimental results with a
DNN-based speaker verification system indicate substantial improvement over
baseline PNCCs on both in-domain and cross-domain scenarios, reflected by
relatively 5.8% and 61.2% maximum lower equal error rate on VoxCeleb1 and
VoxMovies, respectively.
Related papers
- Power-Efficient Indoor Localization Using Adaptive Channel-aware
Ultra-wideband DL-TDOA [7.306334571814026]
We propose and implement a novel low-power channel-aware dynamic frequency DL-TDOA ranging algorithm.
It comprises NLOS probability predictor based on a convolutional neural network (CNN), a dynamic ranging frequency control module, and an IMU sensor-based ranging filter.
arXiv Detail & Related papers (2024-02-16T09:04:04Z) - DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - A neural network-supported two-stage algorithm for lightweight
dereverberation on hearing devices [13.49645012479288]
A two-stage lightweight online dereverberation algorithm for hearing devices is presented in this paper.
The approach combines a multi-channel multi-frame linear filter with a single-channel single-frame post-filter.
Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs)
arXiv Detail & Related papers (2022-04-06T11:08:28Z) - Investigation of Different Calibration Methods for Deep Speaker
Embedding based Verification Systems [66.61691401921296]
This paper presents an investigation over several methods of score calibration for deep speaker embedding extractors.
An additional focus of this research is to estimate the impact of score normalization on the calibration performance of the system.
arXiv Detail & Related papers (2022-03-28T21:22:22Z) - On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and
Elderly Speech Recognition [53.17176024917725]
Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods.
This paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods.
arXiv Detail & Related papers (2022-03-28T09:12:24Z) - Spectro-Temporal Deep Features for Disordered Speech Assessment and
Recognition [65.25325641528701]
Motivated by the spectro-temporal level differences between disordered and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectro-temporal subspace basis embedding deep features derived by SVD decomposition of speech spectrum are proposed.
Experiments conducted on the UASpeech corpus suggest the proposed spectro-temporal deep feature adapted systems consistently outperformed baseline i- adaptation by up to 263% absolute (8.6% relative) reduction in word error rate (WER) with or without data augmentation.
arXiv Detail & Related papers (2022-01-14T16:56:43Z) - Bayesian Learning for Deep Neural Network Adaptation [57.70991105736059]
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.
Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness.
This paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty.
arXiv Detail & Related papers (2020-12-14T12:30:41Z) - A Comparative Re-Assessment of Feature Extractors for Deep Speaker
Embeddings [18.684888457998284]
We provide extensive re-assessment of 14 feature extractors on VoxCeleb and SITW datasets.
Our findings reveal that features equipped with techniques such as spectral centroids, group delay function, and integrated noise suppression provide promising alternatives to MFCCs for deep speaker embeddings extraction.
arXiv Detail & Related papers (2020-07-30T07:55:58Z) - Boosting Objective Scores of a Speech Enhancement Model by MetricGAN
Post-processing [18.19158404358494]
The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications.
Our study applies a modified Transformer in a speech enhancement task.
arXiv Detail & Related papers (2020-06-18T06:22:09Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.