Related papers: Early Stage LM Integration Using Local and Global Log-Linear Combination

Early Stage LM Integration Using Local and Global Log-Linear Combination

URL: http://arxiv.org/abs/2005.10049v1
Date: Wed, 20 May 2020 13:49:55 GMT
Title: Early Stage LM Integration Using Local and Global Log-Linear Combination
Authors: Wilfried Michel and Ralf Schl\"uter and Hermann Ney
Abstract summary: Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM) One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora. We present a novel method for language model integration into implicit-alignment based sequence-to-sequence models.
Score: 46.91755970827846
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM) for the task of automatic speech recognition. One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora. Language model integration is straightforward with the clear separation of acoustic model and language model in classical HMM-based modeling. In contrast, multiple integration schemes have been proposed for attention models. In this work, we present a novel method for language model integration into implicit-alignment based sequence-to-sequence models. Log-linear model combination of acoustic and language model is performed with a per-token renormalization. This allows us to compute the full normalization term efficiently both in training and in testing. This is compared to a global renormalization scheme which is equivalent to applying shallow fusion in training. The proposed methods show good improvements over standard model combination (shallow fusion) on our state-of-the-art Librispeech system. Furthermore, the improvements are persistent even if the LM is exchanged for a more powerful one after training.

Related papers

Poisson-Process Topic Model for Integrating Knowledge from Pre-trained Language Models [9.539646729556793]
We use a pre-trained LLM to convert each document into a sequence of word embeddings. This sequence is then modeled as a Poisson point process, with its intensity measure expressed as a convex combination of $K$ base measures, each corresponding to a topic. We propose a flexible algorithm that integrates traditional topic modeling methods, enhanced by net-rounding applied before and kernel smoothing applied after.
arXiv Detail & Related papers (2025-03-22T16:19:04Z)
Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis [4.062046658662013]
We propose a novel autoregressive modeling approach for speech synthesis. We combine a variational autoencoder (VAE) with a multi-modal latent space and an autoregressive model that uses Gaussian Mixture Models (GMM) as the conditional probability distribution. Our approach significantly outperforms the state-of-the-art autoregressive model VALL-E in both subjective and objective evaluations.
arXiv Detail & Related papers (2025-02-03T05:53:59Z)
No Need to Talk: Asynchronous Mixture of Language Models [25.3581396758015]
SmallTalk LM is an innovative method for training a mixture of language models in an almost asynchronous manner. We show that SmallTalk LM achieves significantly lower perplexity than dense model baselines for the same total training FLOPs and an almost identical inference cost.
arXiv Detail & Related papers (2024-10-04T15:50:10Z)
HM3: Heterogeneous Multi-Class Model Merging [0.0]
We explore training-free model merging techniques to consolidate auxiliary guard-rail models into a single, multi-functional model. We propose Heterogeneous Multi-Class Model Merging (HM3) as a simple technique for merging multi-class classifiers with heterogeneous label spaces. We report promising results for merging BERT-based guard models, some of which attain an average F1-score higher than the source models while reducing the inference time by up to 44%.
arXiv Detail & Related papers (2024-09-27T22:42:45Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Rethinking Masked Language Modeling for Chinese Spelling Correction [70.85829000570203]
We study Chinese Spelling Correction (CSC) as a joint decision made by two separate models: a language model and an error model. We find that fine-tuning BERT tends to over-fit the error model while under-fit the language model, resulting in poor generalization to out-of-distribution error patterns. We demonstrate that a very simple strategy, randomly masking 20% non-error tokens from the input sequence during fine-tuning is sufficient for learning a much better language model without sacrificing the error model.
arXiv Detail & Related papers (2023-05-28T13:19:12Z)
Improving Rare Word Recognition with LM-aware MWER Training [50.241159623691885]
We introduce LMs in the learning of hybrid autoregressive transducer (HAT) models in the discriminative training framework. For the shallow fusion setup, we use LMs during both hypotheses generation and loss computation, and the LM-aware MWER-trained model achieves 10% relative improvement. For the rescoring setup, we learn a small neural module to generate per-token fusion weights in a data-dependent manner.
arXiv Detail & Related papers (2022-04-15T17:19:41Z)
Normalizing Flow based Hidden Markov Models for Classification of Speech Phones with Explainability [25.543231171094384]
In pursuit of explainability, we develop generative models for sequential data. We combine modern neural networks (normalizing flows) and traditional generative models (hidden Markov models - HMMs) The proposed generative models can compute likelihood of a data and hence directly suitable for maximum-likelihood (ML) classification approach.
arXiv Detail & Related papers (2021-07-01T20:10:55Z)
Structured Reordering for Modeling Latent Alignments in Sequence Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z)
Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models [107.86965028729517]
Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. We propose several novel methods to estimate the ILM directly from the AED model.
arXiv Detail & Related papers (2021-04-12T15:16:03Z)
Hybrid Autoregressive Transducer (hat) [11.70833387055716]
This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model. It is a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems. We evaluate our proposed model on a large-scale voice search task.
arXiv Detail & Related papers (2020-03-12T20:47:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.