Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
- URL: http://arxiv.org/abs/2110.07038v1
- Date: Wed, 13 Oct 2021 21:17:15 GMT
- Title: Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
- Authors: Xiangyang Liu, Tianxiang Sun, Junliang He, Lingling Wu, Xinyu Zhang,
Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu
- Abstract summary: This work presents ELUE (Efficient Language Understanding Evaluation), a standard evaluation, and a public leaderboard for efficient NLP models.
Along with the benchmark, we also pre-train and release a strong baseline, ElasticBERT, whose elasticity is both static and dynamic.
- Score: 55.29756535335831
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supersized pre-trained language models have pushed the accuracy of various
NLP tasks to a new state-of-the-art (SOTA). Rather than pursuing the reachless
SOTA accuracy, most works are pursuing improvement on other dimensions such as
efficiency, leading to "Pareto SOTA". Different from accuracy, the metric for
efficiency varies across different studies, making them hard to be fairly
compared. To that end, this work presents ELUE (Efficient Language
Understanding Evaluation), a standard evaluation, and a public leaderboard for
efficient NLP models. ELUE is dedicated to depicting the Pareto Front for
various language understanding tasks, such that it can tell whether and how
much a method achieves Pareto improvement. Along with the benchmark, we also
pre-train and release a strong baseline, ElasticBERT, whose elasticity is both
static and dynamic. ElasticBERT is static in that it allows reducing model
layers on demand. ElasticBERT is dynamic in that it selectively executes parts
of model layers conditioned on the input. We demonstrate the ElasticBERT,
despite its simplicity, outperforms or performs on par with SOTA compressed and
early exiting models. The ELUE benchmark is publicly available at
http://eluebenchmark.fastnlp.top/.
Related papers
- Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Data-Efficient French Language Modeling with CamemBERTa [0.0]
We introduce CamemBERTa, a French DeBERTa model that builds upon the DeBERTaV3 architecture and training objective.
We evaluate our model's performance on a variety of French downstream tasks and datasets.
arXiv Detail & Related papers (2023-06-02T12:45:34Z) - Pre-training Data Quality and Quantity for a Low-Resource Language: New
Corpus and BERT Models for Maltese [4.4681678689625715]
We analyse the effect of pre-training with monolingual data for a low-resource language.
We present a newly created corpus for Maltese, and determine the effect that the pre-training data size and domain have on the downstream performance.
We compare two models on the new corpus: a monolingual BERT model trained from scratch (BERTu), and a further pre-trained multilingual BERT (mBERTu)
arXiv Detail & Related papers (2022-05-21T06:44:59Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
Gradient-Disentangled Embedding Sharing [117.41016786835452]
This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model.
vanilla embedding sharing in ELECTRA hurts training efficiency and model performance.
We propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics.
arXiv Detail & Related papers (2021-11-18T06:48:00Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - MC-BERT: Efficient Language Pre-Training via a Meta Controller [96.68140474547602]
Large-scale pre-training is computationally expensive.
ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator.
We propose a novel meta-learning framework, MC-BERT, to achieve better efficiency and effectiveness.
arXiv Detail & Related papers (2020-06-10T09:22:19Z) - ParsBERT: Transformer-based Model for Persian Language Understanding [0.7646713951724012]
This paper proposes a monolingual BERT for the Persian language (ParsBERT)
It shows its state-of-the-art performance compared to other architectures and multilingual models.
ParsBERT obtains higher scores in all datasets, including existing ones as well as composed ones.
arXiv Detail & Related papers (2020-05-26T05:05:32Z) - HULK: An Energy Efficiency Benchmark Platform for Responsible Natural
Language Processing [76.38975568873765]
We introduce HULK, a multi-task energy efficiency benchmarking platform for responsible natural language processing.
We compare pretrained models' energy efficiency from the perspectives of time and cost.
arXiv Detail & Related papers (2020-02-14T01:04:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.