Are we pretraining it right? Digging deeper into visio-linguistic
pretraining
- URL: http://arxiv.org/abs/2004.08744v1
- Date: Sun, 19 Apr 2020 01:55:19 GMT
- Title: Are we pretraining it right? Digging deeper into visio-linguistic
pretraining
- Authors: Amanpreet Singh, Vedanuj Goswami, Devi Parikh
- Abstract summary: We study how varying similarity between the pretraining dataset domain (textual and visual) and the downstream domain affects performance.
Surprisingly, we show that automatically generated data in a domain closer to the downstream task is a better choice for pretraining than "natural" data.
This suggests that despite the numerous recent efforts, vision & language pretraining does not quite work "out of the box" yet.
- Score: 61.80511482405592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Numerous recent works have proposed pretraining generic visio-linguistic
representations and then finetuning them for downstream vision and language
tasks. While architecture and objective function design choices have received
attention, the choice of pretraining datasets has received little attention. In
this work, we question some of the default choices made in literature. For
instance, we systematically study how varying similarity between the
pretraining dataset domain (textual and visual) and the downstream domain
affects performance. Surprisingly, we show that automatically generated data in
a domain closer to the downstream task (e.g., VQA v2) is a better choice for
pretraining than "natural" data but of a slightly different domain (e.g.,
Conceptual Captions). On the other hand, some seemingly reasonable choices of
pretraining datasets were found to be entirely ineffective for some downstream
tasks. This suggests that despite the numerous recent efforts, vision &
language pretraining does not quite work "out of the box" yet. Overall, as a
by-product of our study, we find that simple design choices in pretraining can
help us achieve close to state-of-art results on downstream tasks without any
architectural changes.
Related papers
- An Unbiased Look at Datasets for Visuo-Motor Pre-Training [20.094244564603184]
We show that dataset choice is just as important to this paradigm's success.
We observe that traditional vision datasets are surprisingly competitive options for visuo-motor representation learning.
We show that common simulation benchmarks are not a reliable proxy for real world performance.
arXiv Detail & Related papers (2023-10-13T17:59:02Z) - On Efficient Transformer and Image Pre-training for Low-level Vision [74.22436001426517]
Pre-training has marked numerous state of the arts in high-level computer vision.
We present an in-depth study of image pre-training.
We find pre-training plays strikingly different roles in low-level tasks.
arXiv Detail & Related papers (2021-12-19T15:50:48Z) - Meta-learning for downstream aware and agnostic pretraining [7.2051162210119495]
We propose using meta-learning to select tasks that provide the most informative learning signals in each episode of pretraining.
We discuss the algorithm of the method and its two variants, downstream-aware and downstream-agnostic pretraining.
arXiv Detail & Related papers (2021-06-06T23:08:09Z) - CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
Representation Learning [49.18591896085498]
We propose CUPID to bridge the domain gap between source and target data.
CUPID yields new state-of-the-art performance across multiple video-language and video tasks.
arXiv Detail & Related papers (2021-04-01T06:42:16Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.