Dynamic Multi-scale Convolution for Dialect Identification
- URL: http://arxiv.org/abs/2108.07787v1
- Date: Mon, 2 Aug 2021 03:37:15 GMT
- Title: Dynamic Multi-scale Convolution for Dialect Identification
- Authors: Tianlong Kong, Shouyi Yin, Dawei Zhang, Wang Geng, Xin Wang, Dandan
Song, Jinwen Huang, Huiyu Shi and Xiaorui Wang
- Abstract summary: We propose dynamic multi-scale convolution, which consists of dynamic kernel convolution, local multi-scale learning, and global multi-scale pooling.
The proposed architecture significantly outperforms state-of-the-art system on the AP20-OLR-dialect-task of oriental language recognition.
- Score: 18.132769601922682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Time Delay Neural Networks (TDNN)-based methods are widely used in dialect
identification. However, in previous work with TDNN application, subtle variant
is being neglected in different feature scales. To address this issue, we
propose a new architecture, named dynamic multi-scale convolution, which
consists of dynamic kernel convolution, local multi-scale learning, and global
multi-scale pooling. Dynamic kernel convolution captures features between
short-term and long-term context adaptively. Local multi-scale learning, which
represents multi-scale features at a granular level, is able to increase the
range of receptive fields for convolution operation. Besides, global
multi-scale pooling is applied to aggregate features from different bottleneck
layers in order to collect information from multiple aspects. The proposed
architecture significantly outperforms state-of-the-art system on the
AP20-OLR-dialect-task of oriental language recognition (OLR) challenge 2020,
with the best average cost performance (Cavg) of 0.067 and the best equal error
rate (EER) of 6.52%. Compared with the known best results, our method achieves
9% of Cavg and 45% of EER relative improvement, respectively. Furthermore, the
parameters of proposed model are 91% fewer than the best known model.
Related papers
- Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning [73.73967342609603]
We introduce a predictor-corrector learning framework to minimize truncation errors.
We also propose an exponential moving average-based coefficient learning method to strengthen our higher-order predictor.
Our model surpasses a robust 3.8B DeepNet by an average of 2.9 SacreBLEU, using only 1/3 parameters.
arXiv Detail & Related papers (2024-11-05T12:26:25Z) - Multimodal Graph Neural Network for Recommendation with Dynamic De-redundancy and Modality-Guided Feature De-noisy [8.799657717956343]
We propose Multimodal Graph Neural Network for Recommendation (MGNM) with Dynamic De-redundancy and Modality-Guided Feature De-noisy.
Experimental results demonstrate MGNM achieves superior performance on multimodal information denoising and removal of redundant information.
arXiv Detail & Related papers (2024-11-03T13:23:07Z) - Differentiable architecture search with multi-dimensional attention for spiking neural networks [4.318876451929319]
Spiking Neural Networks (SNNs) have gained enormous popularity in the field of artificial intelligence.
The majority of SNN methods directly inherit the structure of Artificial Neural Networks (ANN)
We propose Multi-Attention Differentiable Architecture Search (MA-DARTS) to directly automate the search for the optimal network structure of SNNs.
arXiv Detail & Related papers (2024-11-01T07:18:32Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter
for Speaker Verification [3.0831477850153224]
We introduce a novel module called Global-aware Filter layer (GF layer) in this work.
We present a dual-stream TDNN architecture called DS-TDNN for automatic speaker verification (ASV)
Experiments on the Voxceleb and SITW databases demonstrate that the DS-TDNN achieves a relative improvement of 10% together with a relative decline of 20% in computational cost.
arXiv Detail & Related papers (2023-03-20T10:58:12Z) - Deep Multi-Scale Representation Learning with Attention for Automatic
Modulation Classification [11.32380278232938]
We find some experienced improvements by using large kernel size for convolutional deep convolution neural network based AMC.
We propose a multi-scale feature network with large kernel size and SE mechanism (SE-MSFN) in this paper.
SE-MSFN achieves state-of-the-art classification performance on the public well-known RADIOML 2018.01A dataset.
arXiv Detail & Related papers (2022-08-31T07:26:09Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Multi-path Neural Networks for On-device Multi-domain Visual
Classification [55.281139434736254]
This paper proposes a novel approach to automatically learn a multi-path network for multi-domain visual classification on mobile devices.
The proposed multi-path network is learned from neural architecture search by applying one reinforcement learning controller for each domain to select the best path in the super-network created from a MobileNetV3-like search space.
The determined multi-path model selectively shares parameters across domains in shared nodes while keeping domain-specific parameters within non-shared nodes in individual domain paths.
arXiv Detail & Related papers (2020-10-10T05:13:49Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - When Residual Learning Meets Dense Aggregation: Rethinking the
Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations.
Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.