Better Language Model with Hypernym Class Prediction
- URL: http://arxiv.org/abs/2203.10692v1
- Date: Mon, 21 Mar 2022 01:16:44 GMT
- Title: Better Language Model with Hypernym Class Prediction
- Authors: He Bai, Tong Wang, Alessandro Sordoni, Peng Shi
- Abstract summary: Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
- Score: 101.8517004687825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Class-based language models (LMs) have been long devised to address context
sparsity in $n$-gram LMs. In this study, we revisit this approach in the
context of neural LMs. We hypothesize that class-based prediction leads to an
implicit context aggregation for similar words and thus can improve
generalization for rare words. We map words that have a common WordNet hypernym
to the same class and train large neural LMs by gradually annealing from
predicting the class to token prediction during training. Empirically, this
curriculum learning strategy consistently improves perplexity over various
large, highly-performant state-of-the-art Transformer-based models on two
datasets, WikiText-103 and Arxiv. Our analysis shows that the performance
improvement is achieved without sacrificing performance on rare words. Finally,
we document other attempts that failed to yield empirical gains, and discuss
future directions for the adoption of class-based LMs on a larger scale.
Related papers
- Exploring Category Structure with Contextual Language Models and Lexical
Semantic Networks [0.0]
We test a wider array of methods for probing CLMs for predicting typicality scores.
Our experiments, using BERT, show the importance of using the right type of CLM probes.
Results highlight the importance of polysemy in this task.
arXiv Detail & Related papers (2023-02-14T09:57:23Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - Regularized Training of Nearest Neighbor Language Models [10.994336081018043]
We build upon $k$NN-LM citepkhandelwal20generalization, which uses a pre-trained language model together with an exhaustive $k$NN search through the training data (memory bank) to achieve state-of-the-art results.
We find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.
arXiv Detail & Related papers (2021-09-16T23:20:24Z) - Revisiting Simple Neural Probabilistic Language Models [27.957834093475686]
This paper revisits the neural probabilistic language model (NPLM) ofcitetBengio2003ANP.
When scaled up to modern hardware, this model performs much better than expected on word-level language model benchmarks.
Inspired by this result, we modify the Transformer by replacing its first self-attention layer with the NPLM's local concatenation layer.
arXiv Detail & Related papers (2021-04-08T02:18:47Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - A Comparative Study of Lexical Substitution Approaches based on Neural
Language Models [117.96628873753123]
We present a large-scale comparative study of popular neural language and masked language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly.
arXiv Detail & Related papers (2020-05-29T18:43:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.