Related papers: Future Vector Enhanced LSTM Language Model for LVCSR

Future Vector Enhanced LSTM Language Model for LVCSR

URL: http://arxiv.org/abs/2008.01832v1
Date: Fri, 31 Jul 2020 08:38:56 GMT
Title: Future Vector Enhanced LSTM Language Model for LVCSR
Authors: Qi Liu, Yanmin Qian, Kai Yu
Abstract summary: This paper proposes a novel enhanced long short-term memory (LSTM) LM using the future vector. Experiments show that, the proposed new LSTM LM gets a better result on BLEU scores for long term sequence prediction. Rescoring using both the new and conventional LSTM LMs can achieve a very large improvement on the word error rate.
Score: 67.03726018635174
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language models (LM) play an important role in large vocabulary continuous speech recognition (LVCSR). However, traditional language models only predict next single word with given history, while the consecutive predictions on a sequence of words are usually demanded and useful in LVCSR. The mismatch between the single word prediction modeling in trained and the long term sequence prediction in read demands may lead to the performance degradation. In this paper, a novel enhanced long short-term memory (LSTM) LM using the future vector is proposed. In addition to the given history, the rest of the sequence will be also embedded by future vectors. This future vector can be incorporated with the LSTM LM, so it has the ability to model much longer term sequence level information. Experiments show that, the proposed new LSTM LM gets a better result on BLEU scores for long term sequence prediction. For the speech recognition rescoring, although the proposed LSTM LM obtains very slight gains, the new model seems obtain the great complementary with the conventional LSTM LM. Rescoring using both the new and conventional LSTM LMs can achieve a very large improvement on the word error rate.

Related papers

Fusing Large Language Models with Temporal Transformers for Time Series Forecasting [17.549938378193282]
Large language models (LLMs) have demonstrated powerful capabilities in performing various tasks.<n>LLMs are proficient at reasoning over discrete tokens and semantic patterns.<n> vanilla Transformers often struggle to learn high-level semantic patterns.
arXiv Detail & Related papers (2025-07-14T09:33:40Z)
Vectors from Larger Language Models Predict Human Reading Time and fMRI Data More Poorly when Dimensionality Expansion is Controlled [8.414116316164888]
The impressive linguistic abilities of large language models (LLMs) have recommended them as models of human sentence processing.<n>Recent studies have shown this scaling inverts after a point, as LMs become excessively large and accurate, when word prediction probability is used as a predictor.<n>This study evaluates LLM scaling using entire LLM vectors, while controlling for the larger number of predictors in vectors from larger LLMs.
arXiv Detail & Related papers (2025-05-18T02:13:48Z)
Scalable Language Models with Posterior Inference of Latent Thought Vectors [52.63299874322121]
Latent-Thought Language Models (LTMs) incorporate explicit latent thought vectors that follow an explicit prior model in latent space. LTMs possess additional scaling dimensions beyond traditional LLMs, yielding a structured design space. LTMs significantly outperform conventional autoregressive models and discrete diffusion models in validation perplexity and zero-shot language modeling.
arXiv Detail & Related papers (2025-02-03T17:50:34Z)
Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition [26.79555533538622]
This paper proposes a novel model architecture, Transducer-Llama, that integrates large language models (LLMs) into a Factorized Transducer (FT) model. The proposed streaming Transducer-Llama approach gave a 17% relative WER reduction (WERR) over a strong FT baseline and a 32% WERR over an RNN-T baseline.
arXiv Detail & Related papers (2024-12-21T03:35:49Z)
What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length [61.71625297655583]
We show that MORCELA outperforms a commonly used linking theory for acceptability. Larger models require a lower relative degree of adjustment for unigram frequency. Our analysis shows that larger LMs' lower susceptibility to frequency effects can be explained by an ability to better predict rarer words in context.
arXiv Detail & Related papers (2024-11-04T19:05:49Z)
Beam Prediction based on Large Language Models [51.45077318268427]
Millimeter-wave (mmWave) communication is promising for next-generation wireless networks but suffers from significant path loss. Traditional deep learning models, such as long short-term memory (LSTM), enhance beam tracking accuracy however are limited by poor robustness and generalization. In this letter, we use large language models (LLMs) to improve the robustness of beam prediction.
arXiv Detail & Related papers (2024-08-16T12:40:01Z)
Augmented Language Models: a Survey [55.965967655575454]
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. We refer to them as Augmented Language Models (ALMs) The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks.
arXiv Detail & Related papers (2023-02-15T18:25:52Z)
Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs. In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z)
On the long-term learning ability of LSTM LMs [17.700860670640015]
We evaluate a contextual extension based on the Continuous Bag-of-Words (CBOW) model for both sentence- and discourse-level LSTM LMs. Sentence-level models using the long-term contextual module perform comparably to vanilla discourse-level LSTM LMs. On the other hand, the extension does not provide gains for discourse-level models.
arXiv Detail & Related papers (2021-06-16T16:34:37Z)
Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long. We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay. Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z)
Long short-term memory networks and laglasso for bond yield forecasting: Peeping inside the black box [10.412912723760172]
We conduct the first study of bond yield forecasting using long short-term memory (LSTM) networks. We calculate the LSTM signals through time, at selected locations in the memory cell, using sequence-to-sequence architectures.
arXiv Detail & Related papers (2020-05-05T14:23:00Z)
Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM) We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior. Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.