Residual Learning of Neural Text Generation with $n$-gram Language Model
- URL: http://arxiv.org/abs/2210.14431v1
- Date: Wed, 26 Oct 2022 02:42:53 GMT
- Title: Residual Learning of Neural Text Generation with $n$-gram Language Model
- Authors: Huayang Li, Deng Cai, Jin Xu, Taro Watanabe
- Abstract summary: We learn a neural LM that fits the residual between an $n$-gram LM and the real-data distribution.
Our approach attains additional performance gains over popular standalone neural models consistently.
- Score: 41.26228768053928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: $N$-gram language models (LM) have been largely superseded by neural LMs as
the latter exhibits better performance. However, we find that $n$-gram models
can achieve satisfactory performance on a large proportion of testing cases,
indicating they have already captured abundant knowledge of the language with
relatively low computational cost. With this observation, we propose to learn a
neural LM that fits the residual between an $n$-gram LM and the real-data
distribution. The combination of $n$-gram and neural LMs not only allows the
neural part to focus on the deeper understanding of language but also provides
a flexible way to customize an LM by switching the underlying $n$-gram model
without changing the neural model. Experimental results on three typical
language tasks (i.e., language modeling, machine translation, and
summarization) demonstrate that our approach attains additional performance
gains over popular standalone neural models consistently. We also show that our
approach allows for effective domain adaptation by simply switching to a
domain-specific $n$-gram model, without any extra training. Our code is
released at https://github.com/ghrua/NgramRes.
Related papers
- Interpretable Language Modeling via Induction-head Ngram Models [74.26720927767398]
We propose Induction-head ngram models (Induction-Gram) to bolster modern ngram models with a hand-engineered "induction head"
This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions.
Experiments show that this simple method significantly improves next-word prediction over baseline interpretable models.
arXiv Detail & Related papers (2024-10-31T12:33:26Z) - Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG [57.14250086701313]
We investigate the extent to which modern LMs generate $n$-grams from their training data.
We develop Rusty-DAWG, a novel search tool inspired by indexing of genomic data.
arXiv Detail & Related papers (2024-06-18T21:31:19Z) - Neural-g: A Deep Learning Framework for Mixing Density Estimation [16.464806944964003]
Mixing (or prior) density estimation is an important problem in machine learning and statistics.
We propose neural-$g$, a new neural network-based estimator for $g$-modeling.
arXiv Detail & Related papers (2024-06-10T03:00:28Z) - The Role of $n$-gram Smoothing in the Age of Neural Networks [60.23726773548038]
This paper re-opens the role classical $n$-gram smoothing techniques may play in the age of neural language models.
We derive a framework for converting any $n$-gram smoothing technique into a regularizer compatible with neural language models.
arXiv Detail & Related papers (2024-03-25T22:42:19Z) - Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens [138.36729703589512]
We show that $n$-gram language models are still relevant in this era of neural large language models (LLMs)
This was done by modernizing $n$-gram LMs in two aspects. First, we train them at the same data scale as neural LLMs -- 5 trillion tokens.
Second, existing $n$-gram LMs use small $n$ which hinders their performance; we instead allow $n$ to be arbitrarily large, by introducing a new $infty$-gram LM with backoff.
arXiv Detail & Related papers (2024-01-30T19:03:49Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.