MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on
a Massive Scale
- URL: http://arxiv.org/abs/2010.00980v1
- Date: Fri, 2 Oct 2020 13:22:12 GMT
- Title: MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on
a Massive Scale
- Authors: Andreas R\"uckl\'e, Jonas Pfeiffer, Iryna Gurevych
- Abstract summary: We study the zero-shot transfer capabilities of text matching models on a massive scale, by self-supervised training on 140 source domains.
We show that all 140 models transfer surprisingly well, where the large majority of models substantially outperforms common IR baselines.
- Score: 64.11709427403008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the zero-shot transfer capabilities of text matching models on a
massive scale, by self-supervised training on 140 source domains from community
question answering forums in English. We investigate the model performances on
nine benchmarks of answer selection and question similarity tasks, and show
that all 140 models transfer surprisingly well, where the large majority of
models substantially outperforms common IR baselines. We also demonstrate that
considering a broad selection of source domains is crucial for obtaining the
best zero-shot transfer performances, which contrasts the standard procedure
that merely relies on the largest and most similar domains. In addition, we
extensively study how to best combine multiple source domains. We propose to
incorporate self-supervised with supervised multi-task learning on all
available source domains. Our best zero-shot transfer model considerably
outperforms in-domain BERT and the previous state of the art on six benchmarks.
Fine-tuning of our model with in-domain data results in additional large gains
and achieves the new state of the art on all nine benchmarks.
Related papers
- Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification.
We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z) - Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent [2.3967405016776384]
Jack of All Trades (JAT) is a transformer-based model with a unique design optimized for handling sequential decision-making tasks.
JAT is the first model of its kind to be fully open-sourced at https://huggingface.co/jat-project/jat, including a pioneering general-purpose dataset.
arXiv Detail & Related papers (2024-02-15T10:01:55Z) - DG-TTA: Out-of-domain medical image segmentation through Domain Generalization and Test-Time Adaptation [43.842694540544194]
We propose to combine domain generalization and test-time adaptation to create a highly effective approach for reusing pre-trained models in unseen target domains.
We demonstrate that our method, combined with pre-trained whole-body CT models, can effectively segment MR images with high accuracy.
arXiv Detail & Related papers (2023-12-11T10:26:21Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for
Radiology Report Summarization [29.443550756161667]
CheXOFA is a new pre-trained vision-language model (VLM) for the chest X-ray domain.
We unify various domain-specific tasks into a simple sequence-to-sequence schema.
Our system achieves first place on the RadSum23 leaderboard for the hidden test set.
arXiv Detail & Related papers (2023-07-10T21:18:01Z) - A Novel Mix-normalization Method for Generalizable Multi-source Person
Re-identification [49.548815417844786]
Person re-identification (Re-ID) has achieved great success in the supervised scenario.
It is difficult to directly transfer the supervised model to arbitrary unseen domains due to the model overfitting to the seen source domains.
We propose MixNorm, which consists of domain-aware mix-normalization (DMN) and domain-ware center regularization (DCR)
arXiv Detail & Related papers (2022-01-24T18:09:38Z) - Efficient Domain Adaptation of Language Models via Adaptive Tokenization [5.058301279065432]
We show that domain-specific subword sequences can be efficiently determined directly from divergences in the conditional token distributions of the base and domain-specific corpora.
Our approach produces smaller models and less training and inference time than other approaches using tokenizer augmentation.
arXiv Detail & Related papers (2021-09-15T17:51:27Z) - Learning to Generate Novel Domains for Domain Generalization [115.21519842245752]
This paper focuses on the task of learning from multiple source domains a model that generalizes well to unseen domains.
We employ a data generator to synthesize data from pseudo-novel domains to augment the source domains.
Our method, L2A-OT, outperforms current state-of-the-art DG methods on four benchmark datasets.
arXiv Detail & Related papers (2020-07-07T09:34:17Z) - Zero-Resource Cross-Domain Named Entity Recognition [68.83177074227598]
Existing models for cross-domain named entity recognition rely on numerous unlabeled corpus or labeled NER training data in target domains.
We propose a cross-domain NER model that does not use any external resources.
arXiv Detail & Related papers (2020-02-14T09:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.