What do Large Language Models Learn beyond Language?
- URL: http://arxiv.org/abs/2210.12302v1
- Date: Fri, 21 Oct 2022 23:43:13 GMT
- Title: What do Large Language Models Learn beyond Language?
- Authors: Avinash Madasu, Shashank Srivastava
- Abstract summary: We find that pretrained models significantly outperform comparable non-pretrained neural models.
Experiments surprisingly reveal that the positive effects of pre-training persist even when pretraining on multi-lingual text or computer code.
Our findings suggest a hitherto unexplored deep connection between pre-training and inductive learning abilities of language models.
- Score: 10.9650651784511
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LMs) have rapidly become a mainstay in Natural
Language Processing. These models are known to acquire rich linguistic
knowledge from training on large amounts of text. In this paper, we investigate
if pre-training on text also confers these models with helpful `inductive
biases' for non-linguistic reasoning. On a set of 19 diverse non-linguistic
tasks involving quantitative computations, recognizing regular expressions and
reasoning over strings. We find that pretrained models significantly outperform
comparable non-pretrained neural models. This remains true also in experiments
with training non-pretrained models with fewer parameters to account for model
regularization effects. We further explore the effect of text domain on LMs by
pretraining models from text from different domains and provenances. Our
experiments surprisingly reveal that the positive effects of pre-training
persist even when pretraining on multi-lingual text or computer code, and even
for text generated from synthetic languages. Our findings suggest a hitherto
unexplored deep connection between pre-training and inductive learning
abilities of language models.
Related papers
- Can training neural language models on a curriculum with developmentally
plausible data improve alignment with human reading behavior? [0.2745342790938508]
This paper explores the extent to which the misalignment between empirical and model-predicted behavior can be minimized by training models on more developmentally plausible data.
We trained teacher language models on the BabyLM "strict-small" dataset and used sentence level surprisal estimates from these teacher models to create a curriculum.
We found tentative evidence that our curriculum made it easier for models to acquire linguistic knowledge from the training data.
arXiv Detail & Related papers (2023-11-30T18:03:58Z) - Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation [82.5217996570387]
We adapt a pre-trained language model for auto-regressive text-to-image generation.
We find that pre-trained language models offer limited help.
arXiv Detail & Related papers (2023-11-27T07:19:26Z) - Studying the impacts of pre-training using ChatGPT-generated text on
downstream tasks [0.0]
Our research aims to investigate the influence of artificial text in the pre-training phase of language models.
We conducted a comparative analysis between a language model, RoBERTa, pre-trained using CNN/DailyMail news articles, and ChatGPT, which employed the same articles for its training.
We demonstrate that the utilization of artificial text during pre-training does not have a significant impact on either the performance of the models in downstream tasks or their gender bias.
arXiv Detail & Related papers (2023-09-02T12:56:15Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - MonoByte: A Pool of Monolingual Byte-level Language Models [4.491765479948667]
We release 10 monolingual byte-level models rigorously pretrained under the same configuration.
Because they are tokenizer-free, the problem of unseen token embeddings is eliminated.
Experiments on QA and NLI tasks show that our monolingual models achieve competitive performance to the multilingual one.
arXiv Detail & Related papers (2022-09-22T14:32:48Z) - Is neural language acquisition similar to natural? A chronological
probing study [0.0515648410037406]
We present the chronological probing study of transformer English models such as MultiBERT and T5.
We compare the information about the language learned by the models in the process of training on corpora.
The results show that 1) linguistic information is acquired in the early stages of training 2) both language models demonstrate capabilities to capture various features from various levels of language.
arXiv Detail & Related papers (2022-07-01T17:24:11Z) - On the Multilingual Capabilities of Very Large-Scale English Language
Models [0.0]
Generative Pre-trained Transformers (GPTs) have recently been scaled to unprecedented sizes in the history of machine learning.
In this work, we investigate the multilingual skills of GPT-3, focusing on one language that barely appears in the pre-training corpus, Catalan.
We find that the model shows an outstanding performance, particularly in generative tasks, with predictable limitations mostly in language understanding tasks but still with remarkable results given the zero-shot scenario.
arXiv Detail & Related papers (2021-08-30T16:18:50Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Pre-Training a Language Model Without Human Language [74.11825654535895]
We study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance.
We find that models pre-trained on unstructured data beat those trained directly from scratch on downstream tasks.
To our great astonishment, we uncover that pre-training on certain non-human language data gives GLUE performance close to performance pre-trained on another non-English language.
arXiv Detail & Related papers (2020-12-22T13:38:06Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Pretrained Language Model Embryology: The Birth of ALBERT [68.5801642674541]
We investigate the developmental process from a set of randomly parameters to a totipotent language model.
Our results show that ALBERT learns to reconstruct and predict tokens of different parts of speech (POS) in different learning speeds during pretraining.
These findings suggest that knowledge of a pretrained model varies during pretraining, and having more pretrain steps does not necessarily provide a model with more comprehensive knowledge.
arXiv Detail & Related papers (2020-10-06T05:15:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.