MFA: TDNN with Multi-scale Frequency-channel Attention for
Text-independent Speaker Verification with Short Utterances
- URL: http://arxiv.org/abs/2202.01624v2
- Date: Fri, 4 Feb 2022 15:39:24 GMT
- Title: MFA: TDNN with Multi-scale Frequency-channel Attention for
Text-independent Speaker Verification with Short Utterances
- Authors: Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li
- Abstract summary: We propose a multi-scale frequency-channel attention (MFA) to characterize speakers at different scales through a novel dual-path design which consists of a convolutional neural network and TDNN.
We evaluate the proposed MFA on the VoxCeleb database and observe that the proposed framework with MFA can achieve state-of-the-art performance while reducing parameters and complexity.
- Score: 94.70787497137854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The time delay neural network (TDNN) represents one of the state-of-the-art
of neural solutions to text-independent speaker verification. However, they
require a large number of filters to capture the speaker characteristics at any
local frequency region. In addition, the performance of such systems may
degrade under short utterance scenarios. To address these issues, we propose a
multi-scale frequency-channel attention (MFA), where we characterize speakers
at different scales through a novel dual-path design which consists of a
convolutional neural network and TDNN. We evaluate the proposed MFA on the
VoxCeleb database and observe that the proposed framework with MFA can achieve
state-of-the-art performance while reducing parameters and computation
complexity. Further, the MFA mechanism is found to be effective for speaker
verification with short test utterances.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.