Early Stage LM Integration Using Local and Global Log-Linear Combination
- URL: http://arxiv.org/abs/2005.10049v1
- Date: Wed, 20 May 2020 13:49:55 GMT
- Title: Early Stage LM Integration Using Local and Global Log-Linear Combination
- Authors: Wilfried Michel and Ralf Schl\"uter and Hermann Ney
- Abstract summary: Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM)
One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora.
We present a novel method for language model integration into implicit-alignment based sequence-to-sequence models.
- Score: 46.91755970827846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence-to-sequence models with an implicit alignment mechanism (e.g.
attention) are closing the performance gap towards traditional hybrid hidden
Markov models (HMM) for the task of automatic speech recognition. One important
factor to improve word error rate in both cases is the use of an external
language model (LM) trained on large text-only corpora. Language model
integration is straightforward with the clear separation of acoustic model and
language model in classical HMM-based modeling. In contrast, multiple
integration schemes have been proposed for attention models. In this work, we
present a novel method for language model integration into implicit-alignment
based sequence-to-sequence models. Log-linear model combination of acoustic and
language model is performed with a per-token renormalization. This allows us to
compute the full normalization term efficiently both in training and in
testing. This is compared to a global renormalization scheme which is
equivalent to applying shallow fusion in training. The proposed methods show
good improvements over standard model combination (shallow fusion) on our
state-of-the-art Librispeech system. Furthermore, the improvements are
persistent even if the LM is exchanged for a more powerful one after training.
Related papers
- EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Effective internal language model training and fusion for factorized transducer model [26.371223360905557]
Internal language model (ILM) of the neural transducer has been widely studied.
We propose a novel ILM training and decoding strategy for factorized transducer models.
arXiv Detail & Related papers (2024-04-02T08:01:05Z) - Rethinking Masked Language Modeling for Chinese Spelling Correction [70.85829000570203]
We study Chinese Spelling Correction (CSC) as a joint decision made by two separate models: a language model and an error model.
We find that fine-tuning BERT tends to over-fit the error model while under-fit the language model, resulting in poor generalization to out-of-distribution error patterns.
We demonstrate that a very simple strategy, randomly masking 20% non-error tokens from the input sequence during fine-tuning is sufficient for learning a much better language model without sacrificing the error model.
arXiv Detail & Related papers (2023-05-28T13:19:12Z) - SEAM: An Integrated Activation-Coupled Model of Sentence Processing and
Eye Movements in Reading [0.0]
We present a model that combines eye-movement control and sentence processing.
This is the first-ever integration of a complete process model of eye-movement control with linguistic dependency completion processes in sentence comprehension.
arXiv Detail & Related papers (2023-03-09T12:50:34Z) - Improving Rare Word Recognition with LM-aware MWER Training [50.241159623691885]
We introduce LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework.
For the shallow fusion setup, we use LMs during both hypotheses generation and loss computation, and the LM-aware MWER-trained model achieves 10% relative improvement.
For the rescoring setup, we learn a small neural module to generate per-token fusion weights in a data-dependent manner.
arXiv Detail & Related papers (2022-04-15T17:19:41Z) - Normalizing Flow based Hidden Markov Models for Classification of Speech
Phones with Explainability [25.543231171094384]
In pursuit of explainability, we develop generative models for sequential data.
We combine modern neural networks (normalizing flows) and traditional generative models (hidden Markov models - HMMs)
The proposed generative models can compute likelihood of a data and hence directly suitable for maximum-likelihood (ML) classification approach.
arXiv Detail & Related papers (2021-07-01T20:10:55Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Investigating Methods to Improve Language Model Integration for
Attention-based Encoder-Decoder ASR Models [107.86965028729517]
Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions.
We propose several novel methods to estimate the ILM directly from the AED model.
arXiv Detail & Related papers (2021-04-12T15:16:03Z) - Hybrid Autoregressive Transducer (hat) [11.70833387055716]
This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model.
It is a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems.
We evaluate our proposed model on a large-scale voice search task.
arXiv Detail & Related papers (2020-03-12T20:47:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.