A Comparative Study of Transformer-Based Language Models on Extractive
Question Answering
- URL: http://arxiv.org/abs/2110.03142v1
- Date: Thu, 7 Oct 2021 02:23:19 GMT
- Title: A Comparative Study of Transformer-Based Language Models on Extractive
Question Answering
- Authors: Kate Pearce, Tiffany Zhan, Aneesh Komanduri, Justin Zhan
- Abstract summary: We train various pre-trained language models and fine-tune them on multiple question answering datasets.
Using the F1-score as our metric, we find that the RoBERTa and BART pre-trained models perform the best across all datasets.
- Score: 0.5079811885340514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Question Answering (QA) is a task in natural language processing that has
seen considerable growth after the advent of transformers. There has been a
surge in QA datasets that have been proposed to challenge natural language
processing models to improve human and existing model performance. Many
pre-trained language models have proven to be incredibly effective at the task
of extractive question answering. However, generalizability remains as a
challenge for the majority of these models. That is, some datasets require
models to reason more than others. In this paper, we train various pre-trained
language models and fine-tune them on multiple question answering datasets of
varying levels of difficulty to determine which of the models are capable of
generalizing the most comprehensively across different datasets. Further, we
propose a new architecture, BERT-BiLSTM, and compare it with other language
models to determine if adding more bidirectionality can improve model
performance. Using the F1-score as our metric, we find that the RoBERTa and
BART pre-trained models perform the best across all datasets and that our
BERT-BiLSTM model outperforms the baseline BERT model.
Related papers
- Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - N-Grammer: Augmenting Transformers with latent n-grams [35.39961549040385]
We propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence.
We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer.
arXiv Detail & Related papers (2022-07-13T17:18:02Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - bert2BERT: Towards Reusable Pretrained Language Models [51.078081486422896]
We propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model.
bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes.
arXiv Detail & Related papers (2021-10-14T04:05:25Z) - TEASEL: A Transformer-Based Speech-Prefixed Language Model [4.014524824655106]
Multimodal language analysis aims to simultaneously model a speaker's words, acoustical annotations, and facial expressions.
lexicon features usually outperform other modalities because they are pre-trained on large corpora via Transformer-based models.
Despite their strong performance, training a new self-supervised learning (SSL) Transformer on any modality is not usually attainable due to insufficient data.
arXiv Detail & Related papers (2021-09-12T14:08:57Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets.
We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Model Selection for Cross-Lingual Transfer [15.197350103781739]
We propose a machine learning approach to model selection that uses the fine-tuned model's own internal representations to predict its cross-lingual capabilities.
In extensive experiments we find that this method consistently selects better models than English validation data across twenty five languages.
arXiv Detail & Related papers (2020-10-13T02:36:48Z) - A Comparison of LSTM and BERT for Small Corpus [0.0]
Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from scratch.
In this paper we focus on a real-life scenario that scientists in academia and industry face frequently: given a small dataset, can we use a large pre-trained model like BERT and get better results than simple models?
Our experimental results show that bidirectional LSTM models can achieve significantly higher results than a BERT model for a small dataset and these simple models get trained in much less time than tuning the pre-trained counterparts.
arXiv Detail & Related papers (2020-09-11T14:01:14Z) - ParsBERT: Transformer-based Model for Persian Language Understanding [0.7646713951724012]
This paper proposes a monolingual BERT for the Persian language (ParsBERT)
It shows its state-of-the-art performance compared to other architectures and multilingual models.
ParsBERT obtains higher scores in all datasets, including existing ones as well as composed ones.
arXiv Detail & Related papers (2020-05-26T05:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.