Gestalt: a Stacking Ensemble for SQuAD2.0
- URL: http://arxiv.org/abs/2004.07067v1
- Date: Thu, 2 Apr 2020 08:09:22 GMT
- Title: Gestalt: a Stacking Ensemble for SQuAD2.0
- Authors: Mohamed El-Geish
- Abstract summary: We propose a deep-learning system that finds, or indicates the lack of, a correct answer to a question in a context paragraph.
Our goal is to learn an ensemble of heterogeneous SQuAD2.0 models that outperforms the best model in the ensemble per se.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a deep-learning system -- for the SQuAD2.0 task -- that finds, or
indicates the lack of, a correct answer to a question in a context paragraph.
Our goal is to learn an ensemble of heterogeneous SQuAD2.0 models that, when
blended properly, outperforms the best model in the ensemble per se. We created
a stacking ensemble that combines top-N predictions from two models, based on
ALBERT and RoBERTa, into a multiclass classification task to pick the best
answer out of their predictions. We explored various ensemble configurations,
input representations, and model architectures. For evaluation, we examined
test-set EM and F1 scores; our best-performing ensemble incorporated a
CNN-based meta-model and scored 87.117 and 90.306, respectively -- a relative
improvement of 0.55% for EM and 0.61% for F1 scores, compared to the baseline
performance of the best model in the ensemble, an ALBERT-based model, at 86.644
for EM and 89.760 for F1.
Related papers
- Advancing LLM Reasoning Generalists with Preference Trees [119.57169648859707]
We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning.
Eurus models achieve state-of-the-art results among open-source models on a diverse set of benchmarks.
arXiv Detail & Related papers (2024-04-02T16:25:30Z) - Anchor Points: Benchmarking Models with Much Fewer Examples [88.02417913161356]
In six popular language classification benchmarks, model confidence in the correct class on many pairs of points is strongly correlated across models.
We propose Anchor Point Selection, a technique to select small subsets of datasets that capture model behavior across the entire dataset.
Just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error.
arXiv Detail & Related papers (2023-09-14T17:45:51Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - An Automated Question-Answering Framework Based on Evolution Algorithm [19.054115603616513]
We propose an automated Question-Answering framework, which could adjust network architecture for multiple datasets.
Our framework achieves 78.9 EM and 86.1 F1 on SQuAD 1.1, 69.9 EM and 72.5 F1 on SQuAD 2.0.
arXiv Detail & Related papers (2022-01-26T08:13:24Z) - Ensemble ALBERT on SQuAD 2.0 [0.0]
In our Paper, we utilize the fine-tuned ALBERT models and implement combinations of additional layers to improve model performance.
Our best-performing individual model is ALBERT-xxlarge + ALBERT-SQuAD-out, which achieved an F1 score of 88.435 on the dev set.
By passing in several best-performing models' results into our weighted voting ensemble algorithm, our final result ranks first on the Stanford CS224N Test PCE SQuAD Leaderboard with F1 = 90.123.
arXiv Detail & Related papers (2021-10-19T00:15:19Z) - Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs)
We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z) - AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z) - Multiple Run Ensemble Learning withLow-Dimensional Knowledge Graph
Embeddings [4.317340121054659]
We propose a simple but effective performance boosting strategy for knowledge graph embedding (KGE) models.
We repeat the training of a model 6 times in parallel with an embedding size of 200 and then combine the 6 separate models for testing.
We show that our approach enables different models to better cope with their issues on modeling various graph patterns.
arXiv Detail & Related papers (2021-04-11T12:26:50Z) - FiSSA at SemEval-2020 Task 9: Fine-tuned For Feelings [2.362412515574206]
In this paper, we present our approach for sentiment classification on Spanish-English code-mixed social media data.
We explore both monolingual and multilingual models with the standard fine-tuning method.
Although two-step fine-tuning improves sentiment classification performance over the base model, the large multilingual XLM-RoBERTa model achieves best weighted F1-score.
arXiv Detail & Related papers (2020-07-24T14:48:27Z) - XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language
Identification in Social Media Using Transformer Encoders [17.14709845342071]
This paper presents six document classification models using the latest transformer encoders and a high-performing ensemble model for a task of offensive language identification in social media.
Our analysis shows that although the ensemble model significantly improves the accuracy on the development set, the improvement is not as evident on the test set.
arXiv Detail & Related papers (2020-07-21T17:03:00Z) - DeBERTa: Decoding-enhanced BERT with Disentangled Attention [119.77305080520718]
We propose a new model architecture DeBERTa that improves the BERT and RoBERTa models using two novel techniques.
We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks.
arXiv Detail & Related papers (2020-06-05T19:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.