A Universal Discriminator for Zero-Shot Generalization
- URL: http://arxiv.org/abs/2211.08099v2
- Date: Tue, 6 Jun 2023 03:01:43 GMT
- Title: A Universal Discriminator for Zero-Shot Generalization
- Authors: Haike Xu, Zongyu Lin, Jing Zhou, Yanan Zheng, Zhilin Yang
- Abstract summary: Generative modeling has been the dominant approach for large-scale pretraining and zero-shot generalization.
We show that discriminative approaches perform substantially better than generative ones on a large number of NLP tasks.
- Score: 23.48188042332283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative modeling has been the dominant approach for large-scale
pretraining and zero-shot generalization. In this work, we challenge this
convention by showing that discriminative approaches perform substantially
better than generative ones on a large number of NLP tasks. Technically, we
train a single discriminator to predict whether a text sample comes from the
true data distribution, similar to GANs. Since many NLP tasks can be formulated
as selecting from a few options, we use this discriminator to predict the
concatenation of input and which option has the highest probability of coming
from the true data distribution. This simple formulation achieves
state-of-the-art zero-shot results on the T0 benchmark, outperforming T0 by
16.0\%, 7.8\%, and 11.5\% respectively on different scales. In the finetuning
setting, our approach also achieves new state-of-the-art results on a wide
range of NLP tasks, with only 1/4 parameters of previous methods. Meanwhile,
our approach requires minimal prompting efforts, which largely improves
robustness and is essential for real-world applications. Furthermore, we also
jointly train a generalized UD in combination with generative tasks, which
maintains its advantage on discriminative tasks and simultaneously works on
generative tasks.
Related papers
- Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Towards a Generalist and Blind RGB-X Tracker [91.36268768952755]
We develop a single model tracker that can remain blind to any modality X during inference time.
Our training process is extremely simple, integrating multi-label classification loss with a routing function.
Our generalist and blind tracker can achieve competitive performance compared to well-established modal-specific models.
arXiv Detail & Related papers (2024-05-28T03:00:58Z) - Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs [18.242110417706]
This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-tune a pre-trained language model.
We show the optimality of this approach for fine-tuning tasks under certain conditions.
Our proposed method is significantly faster than existing techniques, scaling to millions of samples within a single GPU hour.
arXiv Detail & Related papers (2024-05-05T00:08:00Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - Learning New Tasks from a Few Examples with Soft-Label Prototypes [18.363177410917597]
We propose a novel few-shot learning approach based on soft-label prototypes (SLPs)
We focus on learning previously unseen NLP tasks from very few examples (4, 8, 16) per class.
We experimentally demonstrate that our approach achieves superior performance on the majority of tested tasks in this data-lean setting.
arXiv Detail & Related papers (2022-10-31T16:06:48Z) - Prompt Consistency for Zero-Shot Task Generalization [118.81196556175797]
In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance.
Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency.
Our approach outperforms the state-of-the-art zero-shot learner, T0, on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy.
arXiv Detail & Related papers (2022-04-29T19:18:37Z) - Data Dependent Randomized Smoothing [127.34833801660233]
We show that our data dependent framework can be seamlessly incorporated into 3 randomized smoothing approaches.
We get 9% and 6% improvement over the certified accuracy of the strongest baseline for a radius of 0.5 on CIFAR10 and ImageNet.
arXiv Detail & Related papers (2020-12-08T10:53:11Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.