Future Vector Enhanced LSTM Language Model for LVCSR
- URL: http://arxiv.org/abs/2008.01832v1
- Date: Fri, 31 Jul 2020 08:38:56 GMT
- Title: Future Vector Enhanced LSTM Language Model for LVCSR
- Authors: Qi Liu, Yanmin Qian, Kai Yu
- Abstract summary: This paper proposes a novel enhanced long short-term memory (LSTM) LM using the future vector.
Experiments show that, the proposed new LSTM LM gets a better result on BLEU scores for long term sequence prediction.
Rescoring using both the new and conventional LSTM LMs can achieve a very large improvement on the word error rate.
- Score: 67.03726018635174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LM) play an important role in large vocabulary continuous
speech recognition (LVCSR). However, traditional language models only predict
next single word with given history, while the consecutive predictions on a
sequence of words are usually demanded and useful in LVCSR. The mismatch
between the single word prediction modeling in trained and the long term
sequence prediction in read demands may lead to the performance degradation. In
this paper, a novel enhanced long short-term memory (LSTM) LM using the future
vector is proposed. In addition to the given history, the rest of the sequence
will be also embedded by future vectors. This future vector can be incorporated
with the LSTM LM, so it has the ability to model much longer term sequence
level information. Experiments show that, the proposed new LSTM LM gets a
better result on BLEU scores for long term sequence prediction. For the speech
recognition rescoring, although the proposed LSTM LM obtains very slight gains,
the new model seems obtain the great complementary with the conventional LSTM
LM. Rescoring using both the new and conventional LSTM LMs can achieve a very
large improvement on the word error rate.
Related papers
- What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length [61.71625297655583]
We show that MORCELA outperforms a commonly used linking theory for acceptability.
Larger models require a lower relative degree of adjustment for unigram frequency.
Our analysis shows that larger LMs' lower susceptibility to frequency effects can be explained by an ability to better predict rarer words in context.
arXiv Detail & Related papers (2024-11-04T19:05:49Z) - Beam Prediction based on Large Language Models [51.45077318268427]
Millimeter-wave (mmWave) communication is promising for next-generation wireless networks but suffers from significant path loss.
Traditional deep learning models, such as long short-term memory (LSTM), enhance beam tracking accuracy however are limited by poor robustness and generalization.
In this letter, we use large language models (LLMs) to improve the robustness of beam prediction.
arXiv Detail & Related papers (2024-08-16T12:40:01Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - On the long-term learning ability of LSTM LMs [17.700860670640015]
We evaluate a contextual extension based on the Continuous Bag-of-Words (CBOW) model for both sentence- and discourse-level LSTM LMs.
Sentence-level models using the long-term contextual module perform comparably to vanilla discourse-level LSTM LMs.
On the other hand, the extension does not provide gains for discourse-level models.
arXiv Detail & Related papers (2021-06-16T16:34:37Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - Long short-term memory networks and laglasso for bond yield forecasting:
Peeping inside the black box [10.412912723760172]
We conduct the first study of bond yield forecasting using long short-term memory (LSTM) networks.
We calculate the LSTM signals through time, at selected locations in the memory cell, using sequence-to-sequence architectures.
arXiv Detail & Related papers (2020-05-05T14:23:00Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.