Plex: Towards Reliability using Pretrained Large Model Extensions
- URL: http://arxiv.org/abs/2207.07411v1
- Date: Fri, 15 Jul 2022 11:39:37 GMT
- Title: Plex: Towards Reliability using Pretrained Large Model Extensions
- Authors: Dustin Tran, Jeremiah Liu, Michael W. Dusenberry, Du Phan, Mark
Collier, Jie Ren, Kehang Han, Zi Wang, Zelda Mariet, Huiyi Hu, Neil Band, Tim
G. J. Rudner, Karan Singhal, Zachary Nado, Joost van Amersfoort, Andreas
Kirsch, Rodolphe Jenatton, Nithum Thain, Honglin Yuan, Kelly Buchanan, Kevin
Murphy, D. Sculley, Yarin Gal, Zoubin Ghahramani, Jasper Snoek, Balaji
Lakshminarayanan
- Abstract summary: We develop ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively.
Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol.
We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples.
- Score: 69.13326436826227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A recent trend in artificial intelligence is the use of pretrained models for
language and vision tasks, which have achieved extraordinary performance but
also puzzling failures. Probing these models' abilities in diverse ways is
therefore critical to the field. In this paper, we explore the reliability of
models, where we define a reliable model as one that not only achieves strong
predictive performance but also performs well consistently over many
decision-making tasks involving uncertainty (e.g., selective prediction, open
set recognition), robust generalization (e.g., accuracy and proper scoring
rules such as log-likelihood on in- and out-of-distribution datasets), and
adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of
tasks over 40 datasets in order to evaluate different aspects of reliability on
both vision and language domains. To improve reliability, we developed ViT-Plex
and T5-Plex, pretrained large model extensions for vision and language
modalities, respectively. Plex greatly improves the state-of-the-art across
reliability tasks, and simplifies the traditional protocol as it improves the
out-of-the-box performance and does not require designing scores or tuning the
model for each task. We demonstrate scaling effects over model sizes up to 1B
parameters and pretraining dataset sizes up to 4B examples. We also demonstrate
Plex's capabilities on challenging tasks including zero-shot open set
recognition, active learning, and uncertainty in conversational language
understanding.
Related papers
- Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained
models [62.23255433487586]
We propose an unsupervised fine-tuning framework to fine-tune the model or prompt on the unlabeled target data.
We demonstrate how to apply our method to both language-augmented vision and masked-language models by aligning the discrete distributions extracted from the prompts and target data.
arXiv Detail & Related papers (2023-04-29T22:05:22Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Do Vision-and-Language Transformers Learn Grounded Predicate-Noun
Dependencies? [0.06299766708197882]
We create a new task targeted at evaluating understanding of predicate-noun dependencies in a controlled setup.
We evaluate a range of state-of-the-art models and find that their performance on the task varies considerably.
This study highlights that targeted and controlled evaluations are a crucial step for a precise and rigorous test of the multimodal knowledge of vision-and-language models.
arXiv Detail & Related papers (2022-10-21T16:07:00Z) - Multi Task Learning For Zero Shot Performance Prediction of Multilingual
Models [12.759281077118567]
Massively Multilingual Transformer based Language Models have been observed to be surprisingly effective on zero-shot transfer across languages.
We build upon some of the existing techniques for predicting the zero-shot performance on a task, by modeling it as a multi-task learning problem.
arXiv Detail & Related papers (2022-05-12T14:47:03Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.