Dynamic Kernels and Channel Attention with Multi-Layer Embedding
Aggregation for Speaker Verification
- URL: http://arxiv.org/abs/2211.02000v1
- Date: Thu, 3 Nov 2022 17:13:28 GMT
- Title: Dynamic Kernels and Channel Attention with Multi-Layer Embedding
Aggregation for Speaker Verification
- Authors: Anna Ollerenshaw, Md Asif Jalal, Thomas Hain
- Abstract summary: This paper proposes an approach to increase the model resolution capability using attention-based dynamic kernels in a convolutional neural network.
The proposed dynamic convolutional model achieved 1.62% EER and 0.18 miniDCF on the VoxCeleb1 test set and has a 17% relative improvement compared to the ECAPA-TDNN.
- Score: 28.833851817220616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art speaker verification frameworks have typically focused on
speech enhancement techniques with increasingly deeper (more layers) and wider
(number of channels) models to improve their verification performance. Instead,
this paper proposes an approach to increase the model resolution capability
using attention-based dynamic kernels in a convolutional neural network to
adapt the model parameters to be feature-conditioned. The attention weights on
the kernels are further distilled by channel attention and multi-layer feature
aggregation to learn global features from speech. This approach provides an
efficient solution to improving representation capacity with lower data
resources. This is due to the self-adaptation to inputs of the structures of
the model parameters. The proposed dynamic convolutional model achieved 1.62\%
EER and 0.18 miniDCF on the VoxCeleb1 test set and has a 17\% relative
improvement compared to the ECAPA-TDNN.
Related papers
- Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising [54.110544509099526]
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data.
We propose a hybrid convolution and attention network (HCANet) to enhance HSI denoising.
Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet.
arXiv Detail & Related papers (2024-03-15T07:18:43Z) - Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image
Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding.
The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform.
Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z) - Systematic Architectural Design of Scale Transformed Attention Condenser
DNNs via Multi-Scale Class Representational Response Similarity Analysis [93.0013343535411]
We propose a novel type of analysis called Multi-Scale Class Representational Response Similarity Analysis (ClassRepSim)
We show that adding STAC modules to ResNet style architectures can result in up to a 1.6% increase in top-1 accuracy.
Results from ClassRepSim analysis can be used to select an effective parameterization of the STAC module resulting in competitive performance.
arXiv Detail & Related papers (2023-06-16T18:29:26Z) - An Efficient Speech Separation Network Based on Recurrent Fusion Dilated
Convolution and Channel Attention [0.2538209532048866]
We present an efficient speech separation neural network, ARFDCN, which combines dilated convolutions, multi-scale fusion (MSF), and channel attention.
Experimental results indicate that the model achieves a decent balance between performance and computational efficiency.
arXiv Detail & Related papers (2023-06-09T13:30:27Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - A Multimodal Canonical-Correlated Graph Neural Network for
Energy-Efficient Speech Enhancement [4.395837214164745]
This paper proposes a novel multimodal self-supervised architecture for energy-efficient AV speech enhancement.
It integrates graph neural networks with canonical correlation analysis (CCA-GNN)
Experiments conducted with the benchmark ChiME3 dataset show that our proposed prior frame-based AV CCA-GNN reinforces better feature learning in the temporal context.
arXiv Detail & Related papers (2022-02-09T15:47:07Z) - TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in
CNNs [18.24779045808196]
We propose a lightweight top-down (TD) attention module that iteratively generates a "visual searchlight" to perform top-down channel and spatial modulation of its inputs.
Our models are more robust to changes in input resolution during inference and learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision.
arXiv Detail & Related papers (2021-11-26T12:35:17Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Dynamic Memory Induction Networks for Few-Shot Text Classification [84.88381813651971]
This paper proposes Dynamic Memory Induction Networks (DMIN) for few-shot text classification.
The proposed model achieves new state-of-the-art results on the miniRCV1 and ODIC dataset, improving the best performance (accuracy) by 24%.
arXiv Detail & Related papers (2020-05-12T12:41:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.