AutoBERT-Zero: Evolving BERT Backbone from Scratch
- URL: http://arxiv.org/abs/2107.07445v1
- Date: Thu, 15 Jul 2021 16:46:01 GMT
- Title: AutoBERT-Zero: Evolving BERT Backbone from Scratch
- Authors: Jiahui Gao, Hang Xu, Han shi, Xiaozhe Ren, Philip L.H. Yu, Xiaodan
Liang, Xin Jiang, Zhenguo Li
- Abstract summary: We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
- Score: 94.89102524181986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based pre-trained language models like BERT and its variants have
recently achieved promising performance in various natural language processing
(NLP) tasks. However, the conventional paradigm constructs the backbone by
purely stacking the manually designed global self-attention layers, introducing
inductive bias and thus leading to sub-optimal. In this work, we propose an
Operation-Priority Neural Architecture Search (OP-NAS) algorithm to
automatically search for promising hybrid backbone architectures. Our
well-designed search space (i) contains primitive math operations in the
intra-layer level to explore novel attention structures, and (ii) leverages
convolution blocks to be the supplementary for attention structure in the
inter-layer level to better learn local dependency. We optimize both the search
algorithm and evaluation of candidate models to boost the efficiency of our
proposed OP-NAS. Specifically, we propose Operation-Priority (OP) evolution
strategy to facilitate model search via balancing exploration and exploitation.
Furthermore, we design a Bi-branch Weight-Sharing (BIWS) training strategy for
fast model evaluation. Extensive experiments show that the searched
architecture (named AutoBERT-Zero) significantly outperforms BERT and its
variants of different model capacities in various downstream tasks, proving the
architecture's transfer and generalization abilities. Remarkably,
AutoBERT-Zero-base outperforms RoBERTa-base (using much more data) and
BERT-large (with much larger model size) by 2.4 and 1.4 higher score on GLUE
test set. Code and pre-trained models will be made publicly available.
Related papers
- Structural Pruning of Pre-trained Language Models via Neural Architecture Search [7.833790713816726]
Pre-trained language models (PLM) mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data.
This paper explores neural architecture search (NAS) for structural pruning to find sub-parts of the fine-tuned network that optimally trade-off efficiency.
arXiv Detail & Related papers (2024-05-03T17:34:57Z) - Improving Differentiable Architecture Search with a Generative Model [10.618008515822483]
We introduce a training strategy called Differentiable Architecture Search with a Generative Model(DASGM)"
In DASGM, the training set is used to update the classification model weight, while a synthesized dataset is used to train its architecture.
The generated images have different distributions from the training set, which can help the classification model learn better features to identify its weakness.
arXiv Detail & Related papers (2021-11-30T23:28:02Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient
Pre-trained Language Models [46.69439585453071]
We adopt the one-shot Neural Architecture Search (NAS) to automatically search architecture hyper- parameters.
Specifically, we design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs.
We name our method AutoTinyBERT and evaluate its effectiveness on the GLUE and SQuAD benchmarks.
arXiv Detail & Related papers (2021-07-29T00:47:30Z) - LV-BERT: Exploiting Layer Variety for BERT [85.27287501885807]
We introduce convolution into the layer type set, which is experimentally found beneficial to pre-trained models.
We then adopt an evolutionary algorithm guided by pre-training accuracy to find the optimal architecture.
LV-BERT model obtained by our method outperforms BERT and its variants on various downstream tasks.
arXiv Detail & Related papers (2021-06-22T13:20:14Z) - AutoRC: Improving BERT Based Relation Classification Models via
Architecture Search [50.349407334562045]
BERT based relation classification (RC) models have achieved significant improvements over the traditional deep learning models.
No consensus can be reached on what is the optimal architecture.
We design a comprehensive search space for BERT based RC models and employ neural architecture search (NAS) method to automatically discover the design choices.
arXiv Detail & Related papers (2020-09-22T16:55:49Z) - Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet.
Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs)
Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.