Related papers: Integrating Discrete and Neural Features via Mixed-feature Trans-dimensional Random Field Language Models

Integrating Discrete and Neural Features via Mixed-feature Trans-dimensional Random Field Language Models

URL: http://arxiv.org/abs/2002.05967v2
Date: Thu, 16 Apr 2020 05:23:31 GMT
Title: Integrating Discrete and Neural Features via Mixed-feature Trans-dimensional Random Field Language Models
Authors: Silin Gao, Zhijian Ou, Wei Yang and Huifang Xu
Abstract summary: This paper develops a mixed-feature TRF LM and demonstrates its advantage in integrating discrete and neural features. Various LMs are trained over PTB and Google one-billion-word datasets, and evaluated in N-best list rescoring experiments for speech recognition. Compared to interpolating two separately trained models with discrete and neural features respectively, the performance of mixed-feature TRF LMs matches the best.
Score: 19.409847780307445
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There has been a long recognition that discrete features (n-gram features) and neural network based features have complementary strengths for language models (LMs). Improved performance can be obtained by model interpolation, which is, however, a suboptimal two-step integration of discrete and neural features. The trans-dimensional random field (TRF) framework has the potential advantage of being able to flexibly integrate a richer set of features. However, either discrete or neural features are used alone in previous TRF LMs. This paper develops a mixed-feature TRF LM and demonstrates its advantage in integrating discrete and neural features. Various LMs are trained over PTB and Google one-billion-word datasets, and evaluated in N-best list rescoring experiments for speech recognition. Among all single LMs (i.e. without model interpolation), the mixed-feature TRF LMs perform the best, improving over both discrete TRF LMs and neural TRF LMs alone, and also being significantly better than LSTM LMs. Compared to interpolating two separately trained models with discrete and neural features respectively, the performance of mixed-feature TRF LMs matches the best interpolated model, and with simplified one-step training process and reduced training time.

Related papers

Unbiased Max-Min Embedding Classification for Transductive Few-Shot Learning: Clustering and Classification Are All You Need [83.10178754323955]
Few-shot learning enables models to generalize from only a few labeled examples. We propose the Unbiased Max-Min Embedding Classification (UMMEC) Method, which addresses the key challenges in few-shot learning. Our method significantly improves classification performance with minimal labeled data, advancing the state-of-the-art in annotatedL.
arXiv Detail & Related papers (2025-03-28T07:23:07Z)
No Need to Talk: Asynchronous Mixture of Language Models [25.3581396758015]
Smalltalk LM is an innovative method for training a mixture of language models in an almost asynchronous manner. At inference, a lightweight router directs a given sequence to a single expert, according to a short prefix. Experiments on language modeling demonstrate that SMALLTALK LM achieves significantly lower perplexity than dense model baselines.
arXiv Detail & Related papers (2024-10-04T15:50:10Z)
Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z)
External Language Model Integration for Factorized Neural Transducers [7.5969913968845155]
We propose an adaptation method for factorized neural transducers (FNT) with external language models. We show average gains of 18% WERR with lexical adaptation across various scenarios and additive gains of up to 60% WERR in one entity-rich scenario.
arXiv Detail & Related papers (2023-05-26T23:30:21Z)
Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in Neural Networks [66.76034024335833]
We investigate why diverse/ complex features are learned by the backbone, and their brittleness is due to the linear classification head relying primarily on the simplest features. We propose Feature Reconstruction Regularizer (FRR) to ensure that the learned features can be reconstructed back from the logits. We demonstrate up to 15% gains in OOD accuracy on the recently introduced semi-synthetic datasets with extreme distribution shifts.
arXiv Detail & Related papers (2022-10-04T04:01:15Z)
Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z)
Improving Rare Word Recognition with LM-aware MWER Training [50.241159623691885]
We introduce LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework. For the shallow fusion setup, we use LMs during both hypotheses generation and loss computation, and the LM-aware MWER-trained model achieves 10% relative improvement. For the rescoring setup, we learn a small neural module to generate per-token fusion weights in a data-dependent manner.
arXiv Detail & Related papers (2022-04-15T17:19:41Z)
Bidirectional LSTM-CRF Attention-based Model for Chinese Word Segmentation [2.3991565023534087]
We propose a Bidirectional LSTM-CRF Attention-based Model for Chinese word segmentation. Our model performs better than the baseline methods modeling by other neural networks.
arXiv Detail & Related papers (2021-05-20T11:46:53Z)
Linear Iterative Feature Embedding: An Ensemble Framework for Interpretable Model [6.383006473302968]
A new ensemble framework for interpretable model called Linear Iterative Feature Embedding (LIFE) has been developed. LIFE is able to fit a wide single-hidden-layer neural network (NN) accurately with three steps. LIFE consistently outperforms directly trained single-hidden-layer NNs and also outperforms many other benchmark models.
arXiv Detail & Related papers (2021-03-18T02:01:17Z)
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer [40.63693071222628]
We study the minimum word error rate (MWER) training of Hybrid Autoregressive Transducer (HAT) From experiments with around 30,000 hours of training data, we show that MWER training can improve the accuracy of HAT models.
arXiv Detail & Related papers (2020-10-23T21:16:30Z)
Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model. We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z)
Learning to Learn Kernels with Variational Random Features [118.09565227041844]
We introduce kernels with random Fourier features in the meta-learning framework to leverage their strong few-shot learning ability. We formulate the optimization of MetaVRF as a variational inference problem. We show that MetaVRF delivers much better, or at least competitive, performance compared to existing meta-learning alternatives.
arXiv Detail & Related papers (2020-06-11T18:05:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.