ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves
Zero-Shot Generalization
- URL: http://arxiv.org/abs/2201.06910v1
- Date: Tue, 18 Jan 2022 12:30:17 GMT
- Title: ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves
Zero-Shot Generalization
- Authors: Hanwei Xu, Yujun Chen, Yulun Du, Nan Shao, Yanggang Wang, Haiyu Li,
Zhilin Yang
- Abstract summary: We propose ZeroPrompt for zero-shot generalization, focusing on task scaling and zero-shot prompting.
We show that task scaling can substantially improve training efficiency by 30 times in FLOPs.
We also present a prompting method that incorporates a genetic algorithm to automatically search for the best prompt for unseen tasks.
- Score: 15.28478657477945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a multitask pretraining approach ZeroPrompt for zero-shot
generalization, focusing on task scaling and zero-shot prompting. While
previous models are trained on only a few dozen tasks, we scale to 1,000 tasks
for the first time using real-world data. This leads to a crucial discovery
that task scaling can be an efficient alternative to model scaling; i.e., the
model size has little impact on performance with an extremely large number of
tasks. Our results show that task scaling can substantially improve training
efficiency by 30 times in FLOPs. Moreover, we present a prompting method that
incorporates a genetic algorithm to automatically search for the best prompt
for unseen tasks, along with a few other improvements. Empirically, ZeroPrompt
substantially improves both the efficiency and the performance of zero-shot
learning across a variety of academic and production datasets.
Related papers
- Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Map [4.263847576433289]
Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL)
However, MTL is often challenging because there is an exponential number of possible task groupings.
We propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping.
arXiv Detail & Related papers (2023-07-07T03:54:26Z) - Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning.
The algorithm selects the order of tasks that provide the fastest learning for agents.
We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z) - An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale
Multitask Learning Systems [4.675744559395732]
Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer.
State of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks.
We propose an evolutionary method that can generate a large scale multitask model and can support the dynamic and continuous addition of new tasks.
arXiv Detail & Related papers (2022-05-25T13:10:47Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer [7.2462572989580405]
We propose a novel prompt-based transfer learning approach called SPoT: Soft Prompt Transfer.
We show SPoT significantly boosts the performance of PromptTuning across many tasks.
We also conduct a large-scale study on task transferability with 26 NLP tasks and 160 combinations of source-target tasks.
arXiv Detail & Related papers (2021-10-15T07:35:58Z) - Efficiently Identifying Task Groupings for Multi-Task Learning [55.80489920205404]
Multi-task learning can leverage information learned by one task to benefit the training of other tasks.
We suggest an approach to select which tasks should train together in multi-task learning models.
Our method determines task groupings in a single training run by co-training all tasks together and quantifying the effect to which one task's gradient would affect another task's loss.
arXiv Detail & Related papers (2021-09-10T02:01:43Z) - Rectification-based Knowledge Retention for Continual Learning [49.1447478254131]
Deep learning models suffer from catastrophic forgetting when trained in an incremental learning setting.
We propose a novel approach to address the task incremental learning problem, which involves training a model on new tasks that arrive in an incremental manner.
Our approach can be used in both the zero-shot and non zero-shot task incremental learning settings.
arXiv Detail & Related papers (2021-03-30T18:11:30Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.