Fugu-MT 論文翻訳(概要): CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

論文の概要: CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

arxiv url: http://arxiv.org/abs/2204.06625v1
Date: Wed, 13 Apr 2022 19:54:51 GMT
ステータス: 翻訳完了
システム内更新日: 2022-04-15 12:19:16.357561
Title: CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
Title（参考訳）: CAMERO: ウェイトシェアリングを伴う摂動言語モデルの一貫性規則化
Authors: Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao
Abstract要約: 本稿では,CAMEROと呼ばれる摂動モデルに基づく一貫性規則化アンサンブル学習手法を提案する。具体的には、すべてのモデルで底層重みを共有し、異なるモデルの隠れ表現に異なる摂動を適用し、モデルの多様性を効果的に促進することができる。大規模言語モデルを用いた実験により,CAMEROはアンサンブルモデルの一般化性能を大幅に向上することが示された。
参考スコア（独自算出の注目度）: 83.63107444454938
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model ensemble is a popular approach to produce a low-variance and well-generalized model. However, it induces large memory and inference costs, which are often not affordable for real-world deployment. Existing work has resorted to sharing weights among models. However, when increasing the proportion of the shared weights, the resulting models tend to be similar, and the benefits of using model ensemble diminish. To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO. Specifically, we share the weights of bottom layers across all models and apply different perturbations to the hidden representations for different models, which can effectively promote the model diversity. Meanwhile, we apply a prediction consistency regularizer across the perturbed models to control the variance due to the model diversity. Our experiments using large language models demonstrate that CAMERO significantly improves the generalization performance of the ensemble model. Specifically, CAMERO outperforms the standard ensemble of 8 BERT-base models on the GLUE benchmark by 0.7 with a significantly smaller model size (114.2M vs. 880.6M).
Abstract（参考訳）: モデルアンサンブルは、低分散かつよく一般化されたモデルを生成する一般的なアプローチである。しかし、これは大きなメモリと推論コストを発生させるが、現実のデプロイメントには手頃ではないことが多い。既存の作業は、モデル間で重量を共有することに頼っている。しかし、共有重量の割合を増大させると、結果として得られるモデルは類似する傾向にあり、モデルのアンサンブルを使用する利点は減少する。メモリコストを抑えつつアンサンブルの利点を維持するため,CAMEROと呼ばれる摂動モデルに基づく一貫性規則化アンサンブル学習手法を提案する。具体的には、すべてのモデルで底層重みを共有し、異なるモデルの隠れ表現に異なる摂動を適用し、モデルの多様性を効果的に促進することができる。一方,モデル多様性による分散を制御するために,摂動モデルにまたがる予測一貫性調整器を適用する。大規模言語モデルを用いた実験により,cameroはアンサンブルモデルの一般化性能を大幅に向上させた。具体的には、CAMEROはGLUEベンチマークで8つのBERTベースモデルの標準アンサンブルを0.7で上回り、モデルサイズはかなり小さい(114.2M対880.6M)。

論文の概要: CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

関連論文リスト