Fugu-MT 論文翻訳(概要): RefBERT: Compressing BERT by Referencing to Pre-computed Representations

論文の概要: RefBERT: Compressing BERT by Referencing to Pre-computed Representations

arxiv url: http://arxiv.org/abs/2106.08898v1
Date: Fri, 11 Jun 2021 01:22:08 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-20 16:14:32.136060
Title: RefBERT: Compressing BERT by Referencing to Pre-computed Representations
Title（参考訳）: RefBERT: 事前計算された表現を参照してBERTを圧縮する
Authors: Xinyi Wang, Haiqin Yang, Liang Zhao, Yang Mo, Jianping Shen
Abstract要約: RefBERTはバニラのTinyBERTを8.1%以上上回り、GLUEベンチマークでBERTBASE$の94%以上のパフォーマンスを達成した。 RefBERTは、BERT$_rm BASE$よりも7.4倍小さく、推論では9.5倍高速である。
参考スコア（独自算出の注目度）: 19.807272592342148
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recently developed large pre-trained language models, e.g., BERT, have achieved remarkable performance in many downstream natural language processing applications. These pre-trained language models often contain hundreds of millions of parameters and suffer from high computation and latency in real-world applications. It is desirable to reduce the computation overhead of the models for fast training and inference while keeping the model performance in downstream applications. Several lines of work utilize knowledge distillation to compress the teacher model to a smaller student model. However, they usually discard the teacher's knowledge when in inference. Differently, in this paper, we propose RefBERT to leverage the knowledge learned from the teacher, i.e., facilitating the pre-computed BERT representation on the reference sample and compressing BERT into a smaller student model. To guarantee our proposal, we provide theoretical justification on the loss function and the usage of reference samples. Significantly, the theoretical result shows that including the pre-computed teacher's representations on the reference samples indeed increases the mutual information in learning the student model. Finally, we conduct the empirical evaluation and show that our RefBERT can beat the vanilla TinyBERT over 8.1\% and achieves more than 94\% of the performance of $\BERTBASE$ on the GLUE benchmark. Meanwhile, RefBERT is 7.4x smaller and 9.5x faster on inference than BERT$_{\rm BASE}$.
Abstract（参考訳）: 最近開発された大規模事前学習型言語モデル(bertなど)は、下流の多くの自然言語処理アプリケーションで顕著な性能を達成している。これらの事前訓練された言語モデルは、しばしば数億のパラメータを含み、現実世界のアプリケーションでは高い計算と遅延に悩まされる。下流アプリケーションにおけるモデル性能を維持しつつ、高速なトレーニングと推論のためのモデルの計算オーバーヘッドを低減することが望ましい。いくつかの作品が知識蒸留を利用して教師モデルをより小さな生徒モデルに圧縮している。しかし、彼らは通常、推論において教師の知識を捨てる。そこで,本論文では,教師から学んだ知識を活用し,参照サンプル上の事前計算されたbert表現の促進と,bertをより小さな学生モデルに圧縮するrefbertを提案する。この提案を保証するため、損失関数と参照サンプルの使用に関する理論的正当性を提供する。理論的な結果から,教師の参照サンプルへの表現を含むと,生徒モデル学習における相互情報が増えることが示唆された。最後に、実証的な評価を行い、我々のRefBERTがバニラTinyBERTを8.1 %以上上回り、GLUEベンチマークで$\BERTBASE$の94 %以上のパフォーマンスを達成することを示す。一方、RefBERTは、BERT$_{\rm BASE}$よりも7.4倍小さく、推論では9.5倍高速である。

関連論文リスト

Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting [1.9461727843485295]
そこで本研究では,学生モデルの性能向上のための新しい応答プライシング手法を提案する。 Llama 3.1 405B 教師モデルから知識を抽出し,より小さな Llama 3.1 8B 教師モデルを微調整する。その結果, 蒸留したLlama 3.1 8BインストラクトのGSM8Kは55%向上した。
論文参考訳（メタデータ） (2024-12-18T20:41:44Z)
Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillation [0.6793286055326242]
自然言語処理アプリケーションのための軽量でパワフルなBERTベースのモデルを作成します。ソーシャルメディアのテキストデータから,注意欠陥多動性障害(ADHD)の重症度レベルを分類した実世界のタスクに,結果モデルであるLastBERTを適用した。
論文参考訳（メタデータ） (2024-10-30T17:57:44Z)
GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model [20.620589404103644]
本稿では,より小規模な学生ネットワークによって,教師モデルの隠蔽表現を生成する新しい知識蒸留フレームワークであるGenDistillerを紹介する。提案手法は,従来の隠蔽層を履歴とみなし,教師モデルの層間予測を自己回帰的に実施する。実験により、自動回帰フレームワークを使わずに、GenDistillerのベースライン蒸留法に対する利点が明らかとなった。
論文参考訳（メタデータ） (2024-06-12T01:25:00Z)
ReFT: Representation Finetuning for Language Models [74.51093640257892]
我々はRepresentation Finetuning(ReFT)手法のファミリーを開発する。 ReFTはフリーズベースモデルで動作し、隠れた表現に対するタスク固有の介入を学ぶ。我々は,8つの常識推論タスク,4つの算術推論タスク,命令チューニング,GLUEについてLoReFTを紹介する。
論文参考訳（メタデータ） (2024-04-04T17:00:37Z)
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes [82.99830498937729]
oBERTaは自然言語処理のための使いやすい言語モデルのセットです。 NLPの実践者はモデル圧縮の専門知識なしで3.8倍から24.3倍の高速モデルを得ることができる。代表的な7つのNLPタスクにおけるoBERTaの利用について検討する。
論文参考訳（メタデータ） (2023-03-30T01:37:19Z)
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation [68.30497162547768]
本研究では,Mixture-of-Experts構造を用いてモデルキャパシティと推論速度を向上させるMoEBERTを提案する。自然言語理解と質問応答タスクにおけるMoEBERTの有効性と有効性を検証する。
論文参考訳（メタデータ） (2022-04-15T23:19:37Z)
Sparse Distillation: Speeding Up Text Classification by Using Bigger Models [49.8019791766848]
最先端の変圧器モデルを軽量の学生モデルに拡張することは、推論時の計算コストを削減する効果的な方法である。本稿では,学生モデルの設計空間における新しい領域を探索することにより,推論速度の限界をさらに推し進めることを目的とする。実験の結果,RoBERTa-Large教師の授業成績の97%を6つのテキスト分類タスクのコレクションに保持していることがわかった。
論文参考訳（メタデータ） (2021-10-16T10:04:14Z)
Distilling Dense Representations for Ranking using Tightly-Coupled Teachers [52.85472936277762]
我々は最近提案された後期相互作用ColBERTモデルを改善するために知識蒸留を適用した。 ColBERT の表現型 MaxSim 演算子から知識を抽出し、関連度スコアを単純な点積に変換する。提案手法はクエリ待ち時間を改善し,ColBERTの面倒なストレージ要件を大幅に削減する。
論文参考訳（メタデータ） (2020-10-22T02:26:01Z)
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
BERTのような大規模事前訓練型言語モデルは、NLPアプリケーションに大幅な改善をもたらした。本稿では, BERT推論を高速化するために, 単純だが効果的な手法であるDeeBERTを提案する。実験の結果、DeeBERTはモデル品質の低下を最小限に抑えながら、最大40%の推論時間を節約できることがわかった。
論文参考訳（メタデータ） (2020-04-27T17:58:05Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。