Single-dataset Experts for Multi-dataset Question Answering
- URL: http://arxiv.org/abs/2109.13880v1
- Date: Tue, 28 Sep 2021 17:08:22 GMT
- Title: Single-dataset Experts for Multi-dataset Question Answering
- Authors: Dan Friedman, Ben Dodge, Danqi Chen
- Abstract summary: We train a network on multiple datasets to generalize and transfer better to new datasets.
Our approach is to model multi-dataset question answering with a collection of single-dataset experts.
Simple methods based on parameter-averaging lead to better zero-shot generalization and few-shot transfer performance.
- Score: 6.092171111087768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many datasets have been created for training reading comprehension models,
and a natural question is whether we can combine them to build models that (1)
perform better on all of the training datasets and (2) generalize and transfer
better to new datasets. Prior work has addressed this goal by training one
network simultaneously on multiple datasets, which works well on average but is
prone to over- or under-fitting different sub-distributions and might transfer
worse compared to source models with more overlap with the target dataset. Our
approach is to model multi-dataset question answering with a collection of
single-dataset experts, by training a collection of lightweight,
dataset-specific adapter modules (Houlsby et al., 2019) that share an
underlying Transformer model. We find that these Multi-Adapter Dataset Experts
(MADE) outperform all our baselines in terms of in-distribution accuracy, and
simple methods based on parameter-averaging lead to better zero-shot
generalization and few-shot transfer performance, offering a strong and
versatile starting point for building new reading comprehension systems.
Related papers
- A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - A Framework for Fine-Tuning LLMs using Heterogeneous Feedback [69.51729152929413]
We present a framework for fine-tuning large language models (LLMs) using heterogeneous feedback.
First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF.
Next, given this unified feedback dataset, we extract a high-quality and diverse subset to obtain performance increases.
arXiv Detail & Related papers (2024-08-05T23:20:32Z) - Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks [66.87070857705994]
In low-resource settings, the amount of seed data samples to use for data augmentation is very small.
We propose a novel method that augments training data by incorporating a wealth of examples from other datasets.
This approach can ensure that the generated data is not only relevant but also more diverse than what could be achieved using the limited seed data alone.
arXiv Detail & Related papers (2024-02-21T02:45:46Z) - Combining datasets to increase the number of samples and improve model
fitting [7.4771091238795595]
We propose a novel framework called Combine datasets based on Imputation (ComImp)
In addition, we propose a variant of ComImp that uses Principle Component Analysis (PCA), PCA-ComImp in order to reduce dimension before combining datasets.
Our results indicate that the proposed methods are somewhat similar to transfer learning in that the merge can significantly improve the accuracy of a prediction model on smaller datasets.
arXiv Detail & Related papers (2022-10-11T06:06:37Z) - A Case for Dataset Specific Profiling [0.9023847175654603]
Data-driven science is an emerging paradigm where scientific discoveries depend on the execution of computational AI models against rich, discipline-specific datasets.
With modern machine learning frameworks, anyone can develop and execute computational models that reveal concepts hidden in the data that could enable scientific applications.
For important and widely used datasets, computing the performance of every computational model that can run against a dataset is cost prohibitive in terms of cloud resources.
arXiv Detail & Related papers (2022-08-01T18:38:05Z) - Parsing with Pretrained Language Models, Multiple Datasets, and Dataset
Embeddings [13.097523786733872]
We compare two methods to embed datasets in a transformer-based multilingual dependency.
We confirm that performance increases are highest for small datasets and datasets with a low baseline score.
We show that training on the combination of all datasets performs similarly to designing smaller clusters based on language-relatedness.
arXiv Detail & Related papers (2021-12-07T10:47:07Z) - Transferability Metrics for Selecting Source Model Ensembles [43.980600479738435]
ensemble selection is difficult because fine-tuning all possible ensembles is computationally prohibitive.
We propose several new transferability metrics designed for this task and evaluate them in a challenging and realistic transfer learning setup.
Averaged over 17 target datasets, we outperform these baselines by 6.4% and 2.5% relative mean IoU, respectively.
arXiv Detail & Related papers (2021-11-25T10:43:29Z) - Multi-dataset Pretraining: A Unified Model for Semantic Segmentation [97.61605021985062]
We propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets.
This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets.
In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing.
arXiv Detail & Related papers (2021-06-08T06:13:11Z) - XMixup: Efficient Transfer Learning with Auxiliary Samples by
Cross-domain Mixup [60.07531696857743]
Cross-domain Mixup (XMixup) improves the multitask paradigm for deep transfer learning.
XMixup selects the auxiliary samples from the source dataset and augments training samples via the simple mixup strategy.
Experiment results show that XMixup improves the accuracy by 1.9% on average.
arXiv Detail & Related papers (2020-07-20T16:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.