Integrating Discrete and Neural Features via Mixed-feature
Trans-dimensional Random Field Language Models
- URL: http://arxiv.org/abs/2002.05967v2
- Date: Thu, 16 Apr 2020 05:23:31 GMT
- Title: Integrating Discrete and Neural Features via Mixed-feature
Trans-dimensional Random Field Language Models
- Authors: Silin Gao, Zhijian Ou, Wei Yang and Huifang Xu
- Abstract summary: This paper develops a mixed-feature TRF LM and demonstrates its advantage in integrating discrete and neural features.
Various LMs are trained over PTB and Google one-billion-word datasets, and evaluated in N-best list rescoring experiments for speech recognition.
Compared to interpolating two separately trained models with discrete and neural features respectively, the performance of mixed-feature TRF LMs matches the best.
- Score: 19.409847780307445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been a long recognition that discrete features (n-gram features)
and neural network based features have complementary strengths for language
models (LMs). Improved performance can be obtained by model interpolation,
which is, however, a suboptimal two-step integration of discrete and neural
features. The trans-dimensional random field (TRF) framework has the potential
advantage of being able to flexibly integrate a richer set of features.
However, either discrete or neural features are used alone in previous TRF LMs.
This paper develops a mixed-feature TRF LM and demonstrates its advantage in
integrating discrete and neural features. Various LMs are trained over PTB and
Google one-billion-word datasets, and evaluated in N-best list rescoring
experiments for speech recognition. Among all single LMs (i.e. without model
interpolation), the mixed-feature TRF LMs perform the best, improving over both
discrete TRF LMs and neural TRF LMs alone, and also being significantly better
than LSTM LMs. Compared to interpolating two separately trained models with
discrete and neural features respectively, the performance of mixed-feature TRF
LMs matches the best interpolated model, and with simplified one-step training
process and reduced training time.
Related papers
- Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning.
As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers.
We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z) - External Language Model Integration for Factorized Neural Transducers [7.5969913968845155]
We propose an adaptation method for factorized neural transducers (FNT) with external language models.
We show average gains of 18% WERR with lexical adaptation across various scenarios and additive gains of up to 60% WERR in one entity-rich scenario.
arXiv Detail & Related papers (2023-05-26T23:30:21Z) - Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in
Neural Networks [66.76034024335833]
We investigate why diverse/ complex features are learned by the backbone, and their brittleness is due to the linear classification head relying primarily on the simplest features.
We propose Feature Reconstruction Regularizer (FRR) to ensure that the learned features can be reconstructed back from the logits.
We demonstrate up to 15% gains in OOD accuracy on the recently introduced semi-synthetic datasets with extreme distribution shifts.
arXiv Detail & Related papers (2022-10-04T04:01:15Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Improving Rare Word Recognition with LM-aware MWER Training [50.241159623691885]
We introduce LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework.
For the shallow fusion setup, we use LMs during both hypotheses generation and loss computation, and the LM-aware MWER-trained model achieves 10% relative improvement.
For the rescoring setup, we learn a small neural module to generate per-token fusion weights in a data-dependent manner.
arXiv Detail & Related papers (2022-04-15T17:19:41Z) - Bidirectional LSTM-CRF Attention-based Model for Chinese Word
Segmentation [2.3991565023534087]
We propose a Bidirectional LSTM-CRF Attention-based Model for Chinese word segmentation.
Our model performs better than the baseline methods modeling by other neural networks.
arXiv Detail & Related papers (2021-05-20T11:46:53Z) - Linear Iterative Feature Embedding: An Ensemble Framework for
Interpretable Model [6.383006473302968]
A new ensemble framework for interpretable model called Linear Iterative Feature Embedding (LIFE) has been developed.
LIFE is able to fit a wide single-hidden-layer neural network (NN) accurately with three steps.
LIFE consistently outperforms directly trained single-hidden-layer NNs and also outperforms many other benchmark models.
arXiv Detail & Related papers (2021-03-18T02:01:17Z) - On Minimum Word Error Rate Training of the Hybrid Autoregressive
Transducer [40.63693071222628]
We study the minimum word error rate (MWER) training of Hybrid Autoregressive Transducer (HAT)
From experiments with around 30,000 hours of training data, we show that MWER training can improve the accuracy of HAT models.
arXiv Detail & Related papers (2020-10-23T21:16:30Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Learning to Learn Kernels with Variational Random Features [118.09565227041844]
We introduce kernels with random Fourier features in the meta-learning framework to leverage their strong few-shot learning ability.
We formulate the optimization of MetaVRF as a variational inference problem.
We show that MetaVRF delivers much better, or at least competitive, performance compared to existing meta-learning alternatives.
arXiv Detail & Related papers (2020-06-11T18:05:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.