Investigating Pre-trained Language Models on Cross-Domain Datasets, a
Step Closer to General AI
- URL: http://arxiv.org/abs/2306.12205v1
- Date: Wed, 21 Jun 2023 11:55:17 GMT
- Title: Investigating Pre-trained Language Models on Cross-Domain Datasets, a
Step Closer to General AI
- Authors: Mohamad Ballout, Ulf Krumnack, Gunther Heidemann and Kai-Uwe
K\"uhnberger
- Abstract summary: We investigate the ability of pre-trained language models to generalize to different non-language tasks.
The four pre-trained models that we used, T5, BART, BERT, and GPT-2 achieve outstanding results.
- Score: 0.8889304968879164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models have recently emerged as a powerful tool for
fine-tuning a variety of language tasks. Ideally, when models are pre-trained
on large amount of data, they are expected to gain implicit knowledge. In this
paper, we investigate the ability of pre-trained language models to generalize
to different non-language tasks. In particular, we test them on tasks from
different domains such as computer vision, reasoning on hierarchical data, and
protein fold prediction. The four pre-trained models that we used, T5, BART,
BERT, and GPT-2 achieve outstanding results. They all have similar performance
and they outperform transformers that are trained from scratch by a large
margin. For instance, pre-trained language models perform better on the Listops
dataset, with an average accuracy of 58.7\%, compared to transformers trained
from scratch, which have an average accuracy of 29.0\%. The significant
improvement demonstrated across three types of datasets suggests that
pre-training on language helps the models to acquire general knowledge,
bringing us a step closer to general AI. We also showed that reducing the
number of parameters in pre-trained language models does not have a great
impact as the performance drops slightly when using T5-Small instead of
T5-Base. In fact, when using only 2\% of the parameters, we achieved a great
improvement compared to training from scratch. Finally, in contrast to prior
work, we find out that using pre-trained embeddings for the input layer is
necessary to achieve the desired results.
Related papers
- Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - bert2BERT: Towards Reusable Pretrained Language Models [51.078081486422896]
We propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model.
bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes.
arXiv Detail & Related papers (2021-10-14T04:05:25Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - A Survey of Recent Abstract Summarization Techniques [0.0]
We investigate the impact of pre-training models on several Wikipedia datasets in English and Indonesian language.
The most significant factors that influence ROUGE performance are coverage, density, and compression.
The T5-Large, the Pegasus-XSum, and the ProphetNet-CNNDM provide the best summarization.
arXiv Detail & Related papers (2021-04-15T20:01:34Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z) - Pre-Training a Language Model Without Human Language [74.11825654535895]
We study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance.
We find that models pre-trained on unstructured data beat those trained directly from scratch on downstream tasks.
To our great astonishment, we uncover that pre-training on certain non-human language data gives GLUE performance close to performance pre-trained on another non-English language.
arXiv Detail & Related papers (2020-12-22T13:38:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.