Fugu-MT 論文翻訳(概要): An Automatic and Efficient BERT Pruning for Edge AI Systems

論文の概要: An Automatic and Efficient BERT Pruning for Edge AI Systems

arxiv url: http://arxiv.org/abs/2206.10461v1
Date: Tue, 21 Jun 2022 15:10:29 GMT
ステータス: 翻訳完了
システム内更新日: 2022-06-22 13:27:28.975458
Title: An Automatic and Efficient BERT Pruning for Edge AI Systems
Title（参考訳）: エッジAIシステムのための自動かつ効率的なBERTプルーニング
Authors: Shaoyi Huang, Ning Liu, Yueying Liang, Hongwu Peng, Hongjia Li, Dongkuan Xu, Mimi Xie, Caiwen Ding
Abstract要約: AE-BERTは,「良い」サブネットワーク候補を選択するために,効率的な評価が可能な,自動かつ効率的なBERTプルーニングフレームワークである。提案手法では,人間の経験を必要とせず,多くのNLPタスクの精度向上を実現している。モデル圧縮後、Xilinx Alveo U FPGAボード上の単一BERT$_mathrmBASE$エンコーダの推論時間は1.83$timesである。
参考スコア（独自算出の注目度）: 16.649807141741004
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the yearning for deep learning democratization, there are increasing demands to implement Transformer-based natural language processing (NLP) models on resource-constrained devices for low-latency and high accuracy. Existing BERT pruning methods require domain experts to heuristically handcraft hyperparameters to strike a balance among model size, latency, and accuracy. In this work, we propose AE-BERT, an automatic and efficient BERT pruning framework with efficient evaluation to select a "good" sub-network candidate (with high accuracy) given the overall pruning ratio constraints. Our proposed method requires no human experts experience and achieves a better accuracy performance on many NLP tasks. Our experimental results on General Language Understanding Evaluation (GLUE) benchmark show that AE-BERT outperforms the state-of-the-art (SOTA) hand-crafted pruning methods on BERT$_{\mathrm{BASE}}$. On QNLI and RTE, we obtain 75\% and 42.8\% more overall pruning ratio while achieving higher accuracy. On MRPC, we obtain a 4.6 higher score than the SOTA at the same overall pruning ratio of 0.5. On STS-B, we can achieve a 40\% higher pruning ratio with a very small loss in Spearman correlation compared to SOTA hand-crafted pruning methods. Experimental results also show that after model compression, the inference time of a single BERT$_{\mathrm{BASE}}$ encoder on Xilinx Alveo U200 FPGA board has a 1.83$\times$ speedup compared to Intel(R) Xeon(R) Gold 5218 (2.30GHz) CPU, which shows the reasonableness of deploying the proposed method generated subnets of BERT$_{\mathrm{BASE}}$ model on computation restricted devices.
Abstract（参考訳）: ディープラーニングの民主化が進み、低レイテンシで高精度なリソース制約のあるデバイスにTransformerベースの自然言語処理(NLP)モデルを実装する必要性が高まっている。既存のBERTプルーニング手法では、モデルのサイズ、レイテンシ、精度のバランスをとるために、ドメインの専門家がヒューリスティックにハンドクラフトハイパーパラメーターを必要とする。そこで本研究では,ae-bertを提案する。ae-bertは自動的かつ効率的なbert pruningフレームワークであり,全体としてのpruning比制約を考慮し,(高精度で)"よい"サブネットワーク候補を選択するための効率的な評価を行う。提案手法では,人間の経験を必要とせず,多くのNLPタスクの精度向上を実現している。 General Language Understanding Evaluation (GLUE) ベンチマークの実験結果から, AE-BERT は BERT$_{\mathrm{BASE}}$ 上で, 最先端 (SOTA) の手作りプルーニング手法よりも優れていることが示された。 QNLI と RTE では, 高い精度で, 75 % と 42.8 % のpruning 比が得られる。 MRPCでは,SOTAよりも4.6得点,全体の刈り取り率0.5。 STS-Bでは,SOTAの手作りプルーニング法と比較して,スピアマン相関が極めて小さく,40%高いプルーニング比が得られる。モデル圧縮後、Xilinx Alveo U200 FPGAボード上の単一BERT$_{\mathrm{BASE}}$ encoderの推論時間は、Intel(R) Xeon(R) Gold 5218 (2.30GHz) CPUと比較して1.83$\times$ Speedupであり、BERT$_{\mathrm{BASE}}$モデルが計算制限されたデバイス上で生成されたサブネットをデプロイする妥当性を示している。

関連論文リスト

Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence [38.30075427255948]
大規模言語モデル(LLM)推論を強化するためのTTS(Test-Time Scaling)手法は、しばしばかなりの計算コストを発生させる。本稿では,外部検証モデルに代えてPRMレベルの性能を実現する,効率的な自己誘導型TTSフレームワークである Guided by Gut (GG) を紹介する。
論文参考訳（メタデータ） (2025-05-23T18:19:09Z)
Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint [7.757464614718271]
既存のプルーニング手法はチャネルプルーニングに限られており、アグレッシブパラメータ削減に苦慮している。チャネル, 層, ブロック間のプルーニングを協調的に最適化する新しい多次元プルーニングフレームワークを提案する。 3次元物体検出において,StreamPETRを45%のプルーニング比で刈り上げ,FPS (37.3 vs. 31.7) とmAP (0.451 vs. 0.449) を高密度ベースラインより高めることにより,新たな最先端技術を確立する。
論文参考訳（メタデータ） (2024-06-17T20:40:09Z)
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers [52.199303258423306]
本稿では,事前学習したモデルにおいて,高い活性化空間性を促進する新しい密度損失を提案する。提案手法である textbfDEFT は,RoBERTa$_mathrmLarge$ で textbf44.94% ,Flan-T5$_mathrmXXL$ で textbf53.19% (エンコーダ密度) と textbf90.60% (デコーダ密度) で常に活性化密度を減少させることができる。
論文参考訳（メタデータ） (2024-02-02T21:25:46Z)
A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport [92.96250725599958]
カーネルベース最適輸送(OT)推定器は、サンプルからOT問題に対処するための代替的機能的推定手順を提供する。 SSN法は, 標準正規性条件下でのグローバル収束率$O (1/sqrtk)$, 局所二次収束率を達成できることを示す。
論文参考訳（メタデータ） (2023-10-21T18:48:45Z)
CARE: Confidence-rich Autonomous Robot Exploration using Bayesian Kernel Inference and Optimization [12.32946442160165]
未知・複雑な環境における情報に基づく自律ロボット探査の効率化を検討する。ベイジアンカーネル推論と最適化に基づく新しい軽量情報ゲイン推定法(BKIO)を提案する。異なる非構造, 乱雑な環境下での探索性能を損なうことなく, 提案手法の所望の効率性を示す。
論文参考訳（メタデータ） (2023-09-11T02:30:06Z)
Gradient-Free Structured Pruning with Unlabeled Data [57.999191898036706]
ラベルのないデータのみを使用する勾配のない構造化プルーニングフレームワークを提案する。元々のFLOPカウントの最大40%は、考慮されたすべてのタスクで4%未満の精度で削減できる。
論文参考訳（メタデータ） (2023-03-07T19:12:31Z)
BiBERT: Accurate Fully Binarized BERT [69.35727280997617]
BiBERTは、パフォーマンスボトルネックを取り除くために、正確に2項化されたBERTである。提案手法は,FLOPとモデルサイズで56.3回,31.2回節約できる。
論文参考訳（メタデータ） (2022-03-12T09:46:13Z)
Non-Parametric Adaptive Network Pruning [125.4414216272874]
アルゴリズム設計を簡略化するノンパラメトリックモデリングを導入。顔認識コミュニティに触発されて,メッセージパッシングアルゴリズムを用いて,適応的な例示数を求める。 EPrunerは「重要」フィルタを決定する際にトレーニングデータへの依存を壊します。
論文参考訳（メタデータ） (2021-01-20T06:18:38Z)
TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
本稿では,細調整されたBERTモデルの重みを3元化するternaryBERTを提案する。 GLUEベンチマークとSQuADの実験により,提案した TernaryBERT が他のBERT量子化法より優れていることが示された。
論文参考訳（メタデータ） (2020-09-27T10:17:28Z)
Second-Order Provable Defenses against Adversarial Attacks [63.34032156196848]
ネットワークの固有値が有界であれば、凸最適化を用いて$l$ノルムの証明を効率的に計算できることを示す。認証精度は5.78%,44.96%,43.19%であった。
論文参考訳（メタデータ） (2020-06-01T05:55:18Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。