Future Vector Enhanced LSTM Language Model for LVCSR
- URL: http://arxiv.org/abs/2008.01832v1
- Date: Fri, 31 Jul 2020 08:38:56 GMT
- Title: Future Vector Enhanced LSTM Language Model for LVCSR
- Authors: Qi Liu, Yanmin Qian, Kai Yu
- Abstract summary: This paper proposes a novel enhanced long short-term memory (LSTM) LM using the future vector.
Experiments show that, the proposed new LSTM LM gets a better result on BLEU scores for long term sequence prediction.
Rescoring using both the new and conventional LSTM LMs can achieve a very large improvement on the word error rate.
- Score: 67.03726018635174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LM) play an important role in large vocabulary continuous
speech recognition (LVCSR). However, traditional language models only predict
next single word with given history, while the consecutive predictions on a
sequence of words are usually demanded and useful in LVCSR. The mismatch
between the single word prediction modeling in trained and the long term
sequence prediction in read demands may lead to the performance degradation. In
this paper, a novel enhanced long short-term memory (LSTM) LM using the future
vector is proposed. In addition to the given history, the rest of the sequence
will be also embedded by future vectors. This future vector can be incorporated
with the LSTM LM, so it has the ability to model much longer term sequence
level information. Experiments show that, the proposed new LSTM LM gets a
better result on BLEU scores for long term sequence prediction. For the speech
recognition rescoring, although the proposed LSTM LM obtains very slight gains,
the new model seems obtain the great complementary with the conventional LSTM
LM. Rescoring using both the new and conventional LSTM LMs can achieve a very
large improvement on the word error rate.
Related papers
- Transformers versus LSTMs for electronic trading [0.0]
This study investigates whether Transformer-based model can be applied in financial time series prediction and beat LSTM.
A new LSTM-based model called DLSTM is built and new architecture for the Transformer-based model is designed to adapt for financial prediction.
The experiment result reflects that the Transformer-based model only has the limited advantage in absolute price sequence prediction.
arXiv Detail & Related papers (2023-09-20T15:25:43Z) - Augmented Language Models: a Survey [55.965967655575454]
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools.
We refer to them as Augmented Language Models (ALMs)
The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks.
arXiv Detail & Related papers (2023-02-15T18:25:52Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - On the long-term learning ability of LSTM LMs [17.700860670640015]
We evaluate a contextual extension based on the Continuous Bag-of-Words (CBOW) model for both sentence- and discourse-level LSTM LMs.
Sentence-level models using the long-term contextual module perform comparably to vanilla discourse-level LSTM LMs.
On the other hand, the extension does not provide gains for discourse-level models.
arXiv Detail & Related papers (2021-06-16T16:34:37Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - Long short-term memory networks and laglasso for bond yield forecasting:
Peeping inside the black box [10.412912723760172]
We conduct the first study of bond yield forecasting using long short-term memory (LSTM) networks.
We calculate the LSTM signals through time, at selected locations in the memory cell, using sequence-to-sequence architectures.
arXiv Detail & Related papers (2020-05-05T14:23:00Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z) - Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network.
We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.