Fugu-MT 論文翻訳(概要): Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation

論文の概要: Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation

arxiv url: http://arxiv.org/abs/2009.05647v2
Date: Sat, 26 Sep 2020 12:24:06 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-19 20:41:38.079411
Title: Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation
Title（参考訳）: 圧縮深層ネットワーク:さよならsvd, hello robust low-rank approximation
Authors: Murad Tukan and Alaa Maalouf and Matan Weksler and Dan Feldman
Abstract要約: ニューラルネットワークを圧縮する一般的な手法は、完全に接続された層(または埋め込み層)に対応する行列$AinmathbbRntimes d$の$k$-rank $ell$近似$A_k,2$を計算することである。ここで$d$は層内のニューロンの数、$n$は次のニューロンの数、$A_k,2$は$O(n+d)k)$メモリに格納できる。これ
参考スコア（独自算出の注目度）: 23.06440095688755
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A common technique for compressing a neural network is to compute the $k$-rank $\ell_2$ approximation $A_{k,2}$ of the matrix $A\in\mathbb{R}^{n\times d}$ that corresponds to a fully connected layer (or embedding layer). Here, $d$ is the number of the neurons in the layer, $n$ is the number in the next one, and $A_{k,2}$ can be stored in $O((n+d)k)$ memory instead of $O(nd)$. This $\ell_2$-approximation minimizes the sum over every entry to the power of $p=2$ in the matrix $A - A_{k,2}$, among every matrix $A_{k,2}\in\mathbb{R}^{n\times d}$ whose rank is $k$. While it can be computed efficiently via SVD, the $\ell_2$-approximation is known to be very sensitive to outliers ("far-away" rows). Hence, machine learning uses e.g. Lasso Regression, $\ell_1$-regularization, and $\ell_1$-SVM that use the $\ell_1$-norm. This paper suggests to replace the $k$-rank $\ell_2$ approximation by $\ell_p$, for $p\in [1,2]$. We then provide practical and provable approximation algorithms to compute it for any $p\geq1$, based on modern techniques in computational geometry. Extensive experimental results on the GLUE benchmark for compressing BERT, DistilBERT, XLNet, and RoBERTa confirm this theoretical advantage. For example, our approach achieves $28\%$ compression of RoBERTa's embedding layer with only $0.63\%$ additive drop in the accuracy (without fine-tuning) in average over all tasks in GLUE, compared to $11\%$ drop using the existing $\ell_2$-approximation. Open code is provided for reproducing and extending our results.
Abstract（参考訳）: ニューラルネットワークを圧縮するための一般的なテクニックは、完全連結層(または埋め込み層)に対応する行列 $a\in\mathbb{r}^{n\times d}$の$k$-rank $\ell_2$近似$a_{k,2}$を計算することである。ここで、$d$は層のニューロンの数、$n$は次のニューロンのニューロンの数、$a_{k,2}$は$o(n+d)k)$ではなく$o(nd)$に格納できる。この$\ell_2$-approximation は、行列 $A - A_{k,2}$ 内の全ての入力に対する$p=2$の和を最小化し、すべての行列 $A_{k,2}\in\mathbb{R}^{n\times d}$ のランクが $k$ となる。 SVDで効率的に計算できるが、$\ell_2$-approximation は外れ値に非常に敏感であることが知られている("far-away" rows")。したがって、機械学習はLasso Regression、$\ell_1$-regularization、$\ell_1$-SVMのように、$\ell_1$-normを使用する。本稿では,$k$-rank$\ell_2$近似を$\ell_p$,$p\in [1,2]$に置き換えることを提案する。次に、計算幾何学の現代的な技術に基づいて、任意の$p\geq1$で計算するための実用的で証明可能な近似アルゴリズムを提供する。 bert、distilbert、xlnet、robertaを圧縮するためのglueベンチマークの広範な実験の結果、この理論上の利点が確認された。例えば、我々の手法は、既存の$\ell_2$-approximationを使用して、GLUEのすべてのタスクに対して平均して0.63$%の加算ドロップ(微調整なしで)でRoBERTaの埋め込み層を28.%の圧縮で達成します。結果の再現と拡張のためにオープンコードを提供しています。

論文の概要: Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation

関連論文リスト