LightCAM: A Fast and Light Implementation of Context-Aware Masking based
D-TDNN for Speaker Verification
- URL: http://arxiv.org/abs/2402.06073v2
- Date: Mon, 12 Feb 2024 15:28:38 GMT
- Title: LightCAM: A Fast and Light Implementation of Context-Aware Masking based
D-TDNN for Speaker Verification
- Authors: Di Cao, Xianchen Wang, Junfeng Zhou, Jiakai Zhang, Yanjing Lei and
Wenpeng Chen
- Abstract summary: Traditional Time Delay Neural Networks (TDNN) have achieved state-of-the-art performance at the cost of high computational complexity and slower inference speed.
We propose a fast and lightweight model, LightCAM, which further adopts a depthwise separable convolution module (DSM) and uses multi-scale feature aggregation (MFA) for feature fusion.
- Score: 3.3800597813242628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional Time Delay Neural Networks (TDNN) have achieved state-of-the-art
performance at the cost of high computational complexity and slower inference
speed, making them difficult to implement in an industrial environment. The
Densely Connected Time Delay Neural Network (D-TDNN) with Context Aware Masking
(CAM) module has proven to be an efficient structure to reduce complexity while
maintaining system performance. In this paper, we propose a fast and
lightweight model, LightCAM, which further adopts a depthwise separable
convolution module (DSM) and uses multi-scale feature aggregation (MFA) for
feature fusion at different levels. Extensive experiments are conducted on
VoxCeleb dataset, the comparative results show that it has achieved an EER of
0.83 and MinDCF of 0.0891 in VoxCeleb1-O, which outperforms the other
mainstream speaker verification methods. In addition, complexity analysis
further demonstrates that the proposed architecture has lower computational
cost and faster inference speed.
Related papers
- Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
In neuromorphic computing, spiking neural networks (SNNs) perform inference tasks, offering significant efficiency gains for workloads involving sequential data.
Recent advances in hardware and software have demonstrated that embedding a few bits of payload in each spike exchanged between the spiking neurons can further enhance inference accuracy.
This paper investigates a wireless neuromorphic split computing architecture employing multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators [33.18173790144853]
We present an automated generation approach for fast performance models to accurately estimate the latency of a Deep Neural Networks (DNNs)
We modeled representative DNN accelerators such as Gemmini, UltraTrail, Plasticine-derived, and a parameterizable systolic array.
We evaluate only 154 loop kernel iterations to estimate the performance for 4.19 billion instructions achieving a significant speedup.
arXiv Detail & Related papers (2024-09-13T07:27:55Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition [15.204703947024242]
PaDeLLM-NER allows for the simultaneous decoding of all mentions, thereby reducing generation latency.
Experiments reveal that PaDeLLM-NER significantly increases inference speed that is 1.76 to 10.22 times faster than the autoregressive approach for both English and Chinese.
arXiv Detail & Related papers (2024-02-07T13:39:38Z) - Best of Both Worlds: Hybrid SNN-ANN Architecture for Event-based Optical Flow Estimation [12.611797572621398]
Spiking Neural Networks (SNNs) with their asynchronous event-driven compute show great potential for extracting features from event streams.
We propose a novel SNN-ANN hybrid architecture that combines the strengths of both.
arXiv Detail & Related papers (2023-06-05T15:26:02Z) - SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for
Benchmarking Spiking Neural Networks [4.0300632886917]
SpikeSim is a tool that can perform realistic performance, energy, latency and area evaluation of IMC-mapped SNNs.
We propose SNN topological modifications leading to 1.24x and 10x reduction in the neuronal module's area and the overall energy-delay-product value.
arXiv Detail & Related papers (2022-10-24T01:07:17Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - Fully-parallel Convolutional Neural Network Hardware [0.7829352305480285]
We propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware.
For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA.
arXiv Detail & Related papers (2020-06-22T17:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.