Multi-Frequency Information Enhanced Channel Attention Module for
Speaker Representation Learning
- URL: http://arxiv.org/abs/2207.04540v1
- Date: Sun, 10 Jul 2022 21:19:36 GMT
- Title: Multi-Frequency Information Enhanced Channel Attention Module for
Speaker Representation Learning
- Authors: Mufan Sang, John H.L. Hansen
- Abstract summary: We propose to utilize multi-frequency information and design two novel and effective attention modules.
The proposed attention modules can effectively capture more speaker information from multiple frequency components on the basis of DCT.
Experimental results demonstrate that our proposed SFSC and MFSC attention modules can efficiently generate more discriminative speaker representations.
- Score: 41.44950556040058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, attention mechanisms have been applied successfully in neural
network-based speaker verification systems. Incorporating the
Squeeze-and-Excitation block into convolutional neural networks has achieved
remarkable performance. However, it uses global average pooling (GAP) to simply
average the features along time and frequency dimensions, which is incapable of
preserving sufficient speaker information in the feature maps. In this study,
we show that GAP is a special case of a discrete cosine transform (DCT) on
time-frequency domain mathematically using only the lowest frequency component
in frequency decomposition. To strengthen the speaker information extraction
ability, we propose to utilize multi-frequency information and design two novel
and effective attention modules, called Single-Frequency Single-Channel (SFSC)
attention module and Multi-Frequency Single-Channel (MFSC) attention module.
The proposed attention modules can effectively capture more speaker information
from multiple frequency components on the basis of DCT. We conduct
comprehensive experiments on the VoxCeleb datasets and a probe evaluation on
the 1st 48-UTD forensic corpus. Experimental results demonstrate that our
proposed SFSC and MFSC attention modules can efficiently generate more
discriminative speaker representations and outperform ResNet34-SE and
ECAPA-TDNN systems with relative 20.9% and 20.2% reduction in EER, without
adding extra network parameters.
Related papers
- FE-UNet: Frequency Domain Enhanced U-Net with Segment Anything Capability for Versatile Image Segmentation [50.9040167152168]
We experimentally quantify the contrast sensitivity function of CNNs and compare it with that of the human visual system.
We propose the Wavelet-Guided Spectral Pooling Module (WSPM) to enhance and balance image features across the frequency domain.
To further emulate the human visual system, we introduce the Frequency Domain Enhanced Receptive Field Block (FE-RFB)
We develop FE-UNet, a model that utilizes SAM2 as its backbone and incorporates Hiera-Large as a pre-trained block.
arXiv Detail & Related papers (2025-02-06T07:24:34Z) - Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks.
embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption.
split computing - where an SNN is partitioned across two devices - is a promising solution.
This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - Exploring Cross-Domain Few-Shot Classification via Frequency-Aware Prompting [37.721042095518044]
Cross-Domain Few-Shot Learning has witnessed great stride with the development of meta-learning.
We propose a Frequency-Aware Prompting method with mutual attention for Cross-Domain Few-Shot classification.
arXiv Detail & Related papers (2024-06-24T08:14:09Z) - Complementary Frequency-Varying Awareness Network for Open-Set
Fine-Grained Image Recognition [14.450381668547259]
Open-set image recognition is a challenging topic in computer vision.
We propose a Complementary Frequency-varying Awareness Network that could better capture both high-frequency and low-frequency information.
Based on CFAN, we propose an open-set fine-grained image recognition method, called CFAN-OSFGR.
arXiv Detail & Related papers (2023-07-14T08:15:36Z) - Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - MFA: TDNN with Multi-scale Frequency-channel Attention for
Text-independent Speaker Verification with Short Utterances [94.70787497137854]
We propose a multi-scale frequency-channel attention (MFA) to characterize speakers at different scales through a novel dual-path design which consists of a convolutional neural network and TDNN.
We evaluate the proposed MFA on the VoxCeleb database and observe that the proposed framework with MFA can achieve state-of-the-art performance while reducing parameters and complexity.
arXiv Detail & Related papers (2022-02-03T14:57:05Z) - Speaker Representation Learning using Global Context Guided Channel and
Time-Frequency Transformations [67.18006078950337]
We use the global context information to enhance important channels and recalibrate salient time-frequency locations.
The proposed modules, together with a popular ResNet based model, are evaluated on the VoxCeleb1 dataset.
arXiv Detail & Related papers (2020-09-02T01:07:29Z) - Robust Multi-channel Speech Recognition using Frequency Aligned Network [23.397670239950187]
We use frequency aligned network for robust automatic speech recognition.
We show that our multi-channel acoustic model with a frequency aligned network shows up to 18% relative reduction in word error rate.
arXiv Detail & Related papers (2020-02-06T21:47:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.