Fugu-MT 論文翻訳(概要): NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

論文の概要: NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

arxiv url: http://arxiv.org/abs/2105.14444v1
Date: Sun, 30 May 2021 07:20:27 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-03 12:15:07.718380
Title: NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search
Title（参考訳）: NAS-BERT:ニューラルアーキテクチャ探索によるタスク非依存かつ適応サイズBERT圧縮
Authors: Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Jian Li, Tao Qin, Tie-Yan Liu
Abstract要約: BERT圧縮の効率的な手法であるNAS-BERTを提案する。 NAS-BERTは、検索空間上で大きなスーパーネットをトレーニングし、適応的なサイズとレイテンシを持つ複数の圧縮モデルを出力する。 GLUEとSQuADベンチマークデータセットの実験は、NAS-BERTが以前のアプローチよりも高精度で軽量なモデルを見つけることができることを示した。
参考スコア（独自算出の注目度）: 100.71365025972258
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While pre-trained language models (e.g., BERT) have achieved impressive results on different natural language processing tasks, they have large numbers of parameters and suffer from big computational and memory costs, which make them difficult for real-world deployment. Therefore, model compression is necessary to reduce the computation and memory cost of pre-trained models. In this work, we aim to compress BERT and address the following two challenging practical issues: (1) The compression algorithm should be able to output multiple compressed models with different sizes and latencies, in order to support devices with different memory and latency limitations; (2) The algorithm should be downstream task agnostic, so that the compressed models are generally applicable for different downstream tasks. We leverage techniques in neural architecture search (NAS) and propose NAS-BERT, an efficient method for BERT compression. NAS-BERT trains a big supernet on a search space containing a variety of architectures and outputs multiple compressed models with adaptive sizes and latency. Furthermore, the training of NAS-BERT is conducted on standard self-supervised pre-training tasks (e.g., masked language model) and does not depend on specific downstream tasks. Thus, the compressed models can be used across various downstream tasks. The technical challenge of NAS-BERT is that training a big supernet on the pre-training task is extremely costly. We employ several techniques including block-wise search, search space pruning, and performance approximation to improve search efficiency and accuracy. Extensive experiments on GLUE and SQuAD benchmark datasets demonstrate that NAS-BERT can find lightweight models with better accuracy than previous approaches, and can be directly applied to different downstream tasks with adaptive model sizes for different requirements of memory or latency.
Abstract（参考訳）: 事前訓練された言語モデル(例えばBERT)は、異なる自然言語処理タスクにおいて印象的な結果を得たが、多くのパラメータを持ち、計算とメモリのコストに悩まされており、現実のデプロイメントでは困難である。したがって、事前訓練されたモデルの計算とメモリコストを低減するためにモデル圧縮が必要である。本研究は,BERTを圧縮し,次の2つの課題に対処することを目的としている。(1) 圧縮アルゴリズムは,異なるメモリと遅延制限を持つデバイスをサポートするために,異なるサイズとレイテンシを持つ複数の圧縮モデルを出力できなければならない。我々は、NAS(Neural Architecture Search)の手法を活用し、BERT圧縮の効率的な方法であるNAS-BERTを提案する。 NAS-BERTは、様々なアーキテクチャを含む検索空間上で大きなスーパーネットをトレーニングし、適応的なサイズとレイテンシを持つ複数の圧縮されたモデルを出力する。さらに、NAS-BERTのトレーニングは、標準的な自己監督型事前訓練タスク(例えば、マスク付き言語モデル)で行われ、特定の下流タスクに依存しない。したがって、圧縮されたモデルは様々な下流タスクで使用できる。 NAS-BERTの技術的課題は、トレーニング前のタスクで大きなスーパーネットをトレーニングすることは非常にコストがかかることである。我々は,ブロックワイズ探索,探索空間の刈り取り,性能近似などの手法を用いて,探索効率と精度を向上させる。グルーとスクワッドベンチマークデータセットに関する広範な実験は、nas-bertが以前のアプローチよりも精度の高い軽量モデルを見つけることができ、メモリやレイテンシの異なる要件に対して、適応モデルサイズを備えた下流タスクに直接適用できることを示している。

論文の概要: NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

関連論文リスト