Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks
- URL: http://arxiv.org/abs/2007.08818v4
- Date: Sun, 7 Feb 2021 14:54:13 GMT
- Title: Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks
- Authors: Shoukang Hu, Xurong Xie, Shansong Liu, Mingyu Cui, Mengzhe Geng,
Xunying Liu, Helen Meng
- Abstract summary: A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
- Score: 61.76338096980383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) based automatic speech recognition (ASR) systems
are often designed using expert knowledge and empirical evaluation. In this
paper, a range of neural architecture search (NAS) techniques are used to
automatically learn two types of hyper-parameters of state-of-the-art factored
time delay neural networks (TDNNs): i) the left and right splicing context
offsets; and ii) the dimensionality of the bottleneck linear projection at each
hidden layer. These include the DARTS method integrating architecture selection
with lattice-free MMI (LF-MMI) TDNN training; Gumbel-Softmax and pipelined
DARTS reducing the confusion over candidate architectures and improving the
generalization of architecture selection; and Penalized DARTS incorporating
resource constraints to adjust the trade-off between performance and system
complexity. Parameter sharing among candidate architectures allows efficient
search over up to $7^{28}$ different TDNN systems. Experiments conducted on the
300-hour Switchboard corpus suggest the auto-configured systems consistently
outperform the baseline LF-MMI TDNN systems using manual network design or
random architecture search after LHUC speaker adaptation and RNNLM rescoring.
Absolute word error rate (WER) reductions up to 1.0\% and relative model size
reduction of 28\% were obtained. Consistent performance improvements were also
obtained on a UASpeech disordered speech recognition task using the proposed
NAS approaches.
Related papers
- Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - An optimised deep spiking neural network architecture without gradients [7.183775638408429]
We present an end-to-end trainable modular event-driven neural architecture that uses local synaptic and threshold adaptation rules.
The architecture represents a highly abstracted model of existing Spiking Neural Network (SNN) architectures.
arXiv Detail & Related papers (2021-09-27T05:59:12Z) - A novel Deep Neural Network architecture for non-linear system
identification [78.69776924618505]
We present a novel Deep Neural Network (DNN) architecture for non-linear system identification.
Inspired by fading memory systems, we introduce inductive bias (on the architecture) and regularization (on the loss function)
This architecture allows for automatic complexity selection based solely on available data.
arXiv Detail & Related papers (2021-06-06T10:06:07Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - EfficientTDNN: Efficient Architecture Search for Speaker Recognition in
the Wild [29.59228560095565]
We propose a neural architecture search-based efficient time-delay neural network (EfficientTDNN) to improve inference efficiency while maintaining recognition accuracy.
Experiments on the VoxCeleb dataset show EfficientTDNN provides a huge search space including approximately $1013$s and achieves 1.66% EER and 0.156 DCF$_0.01$ with 565M MACs.
arXiv Detail & Related papers (2021-03-25T03:28:07Z) - Differentiable Neural Architecture Learning for Efficient Neural Network
Design [31.23038136038325]
We introduce a novel emph architecture parameterisation based on scaled sigmoid function.
We then propose a general emphiable Neural Architecture Learning (DNAL) method to optimize the neural architecture without the need to evaluate candidate neural networks.
arXiv Detail & Related papers (2021-03-03T02:03:08Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.