Toward Efficient Language Model Pretraining and Downstream Adaptation
via Self-Evolution: A Case Study on SuperGLUE
- URL: http://arxiv.org/abs/2212.01853v1
- Date: Sun, 4 Dec 2022 15:36:18 GMT
- Title: Toward Efficient Language Model Pretraining and Downstream Adaptation
via Self-Evolution: A Case Study on SuperGLUE
- Authors: Qihuang Zhong, Liang Ding, Yibing Zhan, Yu Qiao, Yonggang Wen, Li
Shen, Juhua Liu, Baosheng Yu, Bo Du, Yixin Chen, Xinbo Gao, Chunyan Miao,
Xiaoou Tang and Dacheng Tao
- Abstract summary: This report describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard.
SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks.
- Score: 203.65227947509933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report briefly describes our JDExplore d-team's Vega v2
submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the
widely used general language understanding evaluation (GLUE) benchmark,
containing eight difficult language understanding tasks, including question
answering, natural language inference, word sense disambiguation, coreference
resolution, and reasoning. [Method] Instead of arbitrarily increasing the size
of a pretrained language model (PLM), our aim is to 1) fully extract knowledge
from the input pretraining data given a certain parameter budget, e.g., 6B, and
2) effectively transfer this knowledge to downstream tasks. To achieve goal 1),
we propose self-evolution learning for PLMs to wisely predict the informative
tokens that should be masked, and supervise the masked language modeling (MLM)
process with rectified smooth labels. For goal 2), we leverage the prompt
transfer technique to improve the low-resource tasks by transferring the
knowledge from the foundation model and related downstream tasks to the target
task. [Results] According to our submission record (Oct. 2022), with our
optimized pretraining and fine-tuning strategies, our 6B Vega method achieved
new state-of-the-art performance on 4/8 tasks, sitting atop the SuperGLUE
leaderboard on Oct. 8, 2022, with an average score of 91.3.
Related papers
- TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning [46.153828074152436]
We propose a pivot language guided generation approach to enhance instruction tuning in lower-resource languages.
It trains the model to first process instructions in the pivot language, and then produce responses in the target language.
Our approach demonstrates a significant improvement in the instruction-following abilities of LLMs by 29% on average.
arXiv Detail & Related papers (2023-11-15T05:28:07Z) - Plan, Eliminate, and Track -- Language Models are Good Teachers for
Embodied Agents [99.17668730578586]
Pre-trained large language models (LLMs) capture procedural knowledge about the world.
Plan, Eliminate, and Track (PET) framework translates a task description into a list of high-level sub-tasks.
PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
arXiv Detail & Related papers (2023-05-03T20:11:22Z) - Bag of Tricks for Effective Language Model Pretraining and Downstream
Adaptation: A Case Study on GLUE [93.98660272309974]
This report briefly describes our submission Vega v1 on the General Language Understanding Evaluation leaderboard.
GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.
With our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3.
arXiv Detail & Related papers (2023-02-18T09:26:35Z) - AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs.
Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings.
In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z) - Generating Training Data with Language Models: Towards Zero-Shot
Language Understanding [35.92571138322246]
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks.
We present a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks.
Our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark.
arXiv Detail & Related papers (2022-02-09T16:02:18Z) - Multilingual Speech Recognition using Knowledge Transfer across Learning
Processes [15.927513451432946]
Experimental results reveal the best pre-training strategy resulting in 3.55% relative reduction in overall WER.
A combination of LEAP and SSL yields 3.51% relative reduction in overall WER when using language ID.
arXiv Detail & Related papers (2021-10-15T07:50:27Z) - MC-BERT: Efficient Language Pre-Training via a Meta Controller [96.68140474547602]
Large-scale pre-training is computationally expensive.
ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator.
We propose a novel meta-learning framework, MC-BERT, to achieve better efficiency and effectiveness.
arXiv Detail & Related papers (2020-06-10T09:22:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.