Ensemble Learning-Based Approach for Improving Generalization Capability
of Machine Reading Comprehension Systems
- URL: http://arxiv.org/abs/2107.00368v1
- Date: Thu, 1 Jul 2021 11:11:17 GMT
- Title: Ensemble Learning-Based Approach for Improving Generalization Capability
of Machine Reading Comprehension Systems
- Authors: Razieh Baradaran and Hossein Amirkhani
- Abstract summary: Machine Reading (MRC) is an active field in natural language processing with many successful developed models in recent years.
Despite their high in-distribution accuracy, these models suffer from two issues: high training cost and low out-of-distribution accuracy.
In this paper, we investigate the effect of ensemble learning approach to improve generalization of MRC systems without retraining a big model.
- Score: 0.7614628596146599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Reading Comprehension (MRC) is an active field in natural language
processing with many successful developed models in recent years. Despite their
high in-distribution accuracy, these models suffer from two issues: high
training cost and low out-of-distribution accuracy. Even though some approaches
have been presented to tackle the generalization problem, they have high,
intolerable training costs. In this paper, we investigate the effect of
ensemble learning approach to improve generalization of MRC systems without
retraining a big model. After separately training the base models with
different structures on different datasets, they are ensembled using weighting
and stacking approaches in probabilistic and non-probabilistic settings. Three
configurations are investigated including heterogeneous, homogeneous, and
hybrid on eight datasets and six state-of-the-art models. We identify the
important factors in the effectiveness of ensemble methods. Also, we compare
the robustness of ensemble and fine-tuned models against data distribution
shifts. The experimental results show the effectiveness and robustness of the
ensemble approach in improving the out-of-distribution accuracy of MRC systems,
especially when the base models are similar in accuracies.
Related papers
- What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - Diversity-Aware Ensembling of Language Models Based on Topological Data
Analysis [3.1734682813501514]
Existing approaches mostly rely on simple averaging of predictions by ensembles with equal weights for each model.
We propose to estimate weights for ensembles of NLP models using not only knowledge of their individual performance but also their similarity to each other.
arXiv Detail & Related papers (2024-02-22T00:04:21Z) - Mixed Semi-Supervised Generalized-Linear-Regression with applications to Deep-Learning and Interpolators [6.537685198688539]
We present a methodology for using unlabeled data to design semi supervised learning (SSL) methods.
We include in each of them a mixing parameter $alpha$, controlling the weight given to the unlabeled data.
We demonstrate the effectiveness of our methodology in delivering substantial improvement compared to the standard supervised models.
arXiv Detail & Related papers (2023-02-19T09:55:18Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Wavelet-Based Hybrid Machine Learning Model for Out-of-distribution
Internet Traffic Prediction [3.689539481706835]
This paper investigates machine learning performances using eXtreme Gradient Boosting, Light Gradient Boosting Machine, Gradient Descent, Gradient Boosting Regressor, Cat Regressor.
We propose a hybrid machine learning model integrating wavelet decomposition for improving out-of-distribution prediction.
arXiv Detail & Related papers (2022-05-09T14:34:42Z) - Using Explainable Boosting Machine to Compare Idiographic and Nomothetic
Approaches for Ecological Momentary Assessment Data [2.0824228840987447]
This paper explores the use of non-linear interpretable machine learning (ML) models in classification problems.
Various ensembles of trees are compared to linear models using imbalanced synthetic and real-world datasets.
In one of the two real-world datasets, knowledge distillation method achieves improved AUC scores.
arXiv Detail & Related papers (2022-04-04T17:56:37Z) - Learning Distributionally Robust Models at Scale via Composite
Optimization [45.47760229170775]
We show how different variants of DRO are simply instances of a finite-sum composite optimization for which we provide scalable methods.
We also provide empirical results that demonstrate the effectiveness of our proposed algorithm with respect to the prior art in order to learn robust models from very large datasets.
arXiv Detail & Related papers (2022-03-17T20:47:42Z) - Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs)
We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.