Enhancement of a Text-Independent Speaker Verification System by using
Feature Combination and Parallel-Structure Classifiers
- URL: http://arxiv.org/abs/2401.15018v1
- Date: Fri, 26 Jan 2024 17:19:59 GMT
- Title: Enhancement of a Text-Independent Speaker Verification System by using
Feature Combination and Parallel-Structure Classifiers
- Authors: Kerlos Atia Abdalmalak and Ascensi\'on Gallardo-Antol'in
- Abstract summary: This paper explores the two modules: feature extraction and classification.
The choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification.
To enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speaker Verification (SV) systems involve mainly two individual stages:
feature extraction and classification. In this paper, we explore these two
modules with the aim of improving the performance of a speaker verification
system under noisy conditions. On the one hand, the choice of the most
appropriate acoustic features is a crucial factor for performing robust speaker
verification. The acoustic parameters used in the proposed system are: Mel
Frequency Cepstral Coefficients (MFCC), their first and second derivatives
(Deltas and Delta- Deltas), Bark Frequency Cepstral Coefficients (BFCC),
Perceptual Linear Predictive (PLP), and Relative Spectral Transform -
Perceptual Linear Predictive (RASTA-PLP). In this paper, a complete comparison
of different combinations of the previous features is discussed. On the other
hand, the major weakness of a conventional Support Vector Machine (SVM)
classifier is the use of generic traditional kernel functions to compute the
distances among data points. However, the kernel function of an SVM has great
influence on its performance. In this work, we propose the combination of two
SVM-based classifiers with different kernel functions: Linear kernel and
Gaussian Radial Basis Function (RBF) kernel with a Logistic Regression (LR)
classifier. The combination is carried out by means of a parallel structure
approach, in which different voting rules to take the final decision are
considered. Results show that significant improvement in the performance of the
SV system is achieved by using the combined features with the combined
classifiers either with clean speech or in the presence of noise. Finally, to
enhance the system more in noisy environments, the inclusion of the multiband
noise removal technique as a preprocessing stage is proposed.
Related papers
- A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.
We also develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
Our proposed ReSFU framework consistently achieves satisfactory performance on different segmentation applications.
arXiv Detail & Related papers (2024-07-02T14:12:21Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Task-Oriented Sensing, Computation, and Communication Integration for
Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC)
We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - In-filter Computing For Designing Ultra-light Acoustic Pattern
Recognizers [6.335302509003343]
We present a novel in-filter computing framework that can be used for designing ultra-light acoustic classifiers.
The proposed architecture integrates the convolution and nonlinear filtering operations directly into the kernels of a Support Vector Machine.
We show that the system can achieve robust classification performance on benchmark sound recognition tasks using only 1.5k Look-Up Tables (LUTs) and 2.8k Flip-Flops (FFs)
arXiv Detail & Related papers (2021-09-11T08:16:53Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence
Modeling [61.351967629600594]
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq), non-parallel voice conversion approach.
In this approach, we combine a bottle-neck feature extractor (BNE) with a seq2seq synthesis module.
Objective and subjective evaluations show that the proposed any-to-many approach has superior voice conversion performance in terms of both naturalness and speaker similarity.
arXiv Detail & Related papers (2020-09-06T13:01:06Z) - Radial basis function kernel optimization for Support Vector Machine
classifiers [4.888981184420116]
OKSVM is an algorithm that automatically learns the RBF kernel hyper parameter and adjusts the SVM weights simultaneously.
We analyze the performance of our approach with respect to the classical SVM for classification on synthetic and real data.
arXiv Detail & Related papers (2020-07-16T10:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.