Fugu-MT 論文翻訳(概要): To Filter Prune, or to Layer Prune, That Is The Question

論文の概要: To Filter Prune, or to Layer Prune, That Is The Question

arxiv url: http://arxiv.org/abs/2007.05667v3
Date: Sun, 8 Nov 2020 17:48:23 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-11 13:37:35.999730
Title: To Filter Prune, or to Layer Prune, That Is The Question
Title（参考訳）: Pruneをフィルタするか、Pruneを階層化する
Authors: Sara Elkerdawy, Mostafa Elhoushi, Abhineet Singh, Hong Zhang and Nilanjan Ray
Abstract要約: 遅延低減の観点からフィルタプルーニング手法の限界を示す。同様の精度でフィルタプルーニング法よりも高い遅延低減を実現するための,異なる基準に基づくレイヤプルーニング法を提案する。 LayerPruneはまた、ImageNetデータセットの同様の遅延予算に対して、Shufflenet、MobileNet、MNASNet、ResNet18などの手作りアーキテクチャを7.3%、4.6%、2.8%、0.5%で上回っている。
参考スコア（独自算出の注目度）: 13.450136532402226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in pruning of neural networks have made it possible to remove a large number of filters or weights without any perceptible drop in accuracy. The number of parameters and that of FLOPs are usually the reported metrics to measure the quality of the pruned models. However, the gain in speed for these pruned models is often overlooked in the literature due to the complex nature of latency measurements. In this paper, we show the limitation of filter pruning methods in terms of latency reduction and propose LayerPrune framework. LayerPrune presents a set of layer pruning methods based on different criteria that achieve higher latency reduction than filter pruning methods on similar accuracy. The advantage of layer pruning over filter pruning in terms of latency reduction is a result of the fact that the former is not constrained by the original model's depth and thus allows for a larger range of latency reduction. For each filter pruning method we examined, we use the same filter importance criterion to calculate a per-layer importance score in one-shot. We then prune the least important layers and fine-tune the shallower model which obtains comparable or better accuracy than its filter-based pruning counterpart. This one-shot process allows to remove layers from single path networks like VGG before fine-tuning, unlike in iterative filter pruning, a minimum number of filters per layer is required to allow for data flow which constraint the search space. To the best of our knowledge, we are the first to examine the effect of pruning methods on latency metric instead of FLOPs for multiple networks, datasets and hardware targets. LayerPrune also outperforms handcrafted architectures such as Shufflenet, MobileNet, MNASNet and ResNet18 by 7.3%, 4.6%, 2.8% and 0.5% respectively on similar latency budget on ImageNet dataset.
Abstract（参考訳）: ニューラルネットワークのプルーニングの最近の進歩により、知覚できる精度の低下なしに、多数のフィルタや重みを除去することが可能になった。パラメータの数とFLOPは、通常、刈り取られたモデルの品質を測定するために報告される指標である。しかしながら、これらの刈り取られたモデルの速度の上昇は、遅延測定の複雑な性質のため、文献ではしばしば見過ごされる。本稿では,遅延低減の観点からフィルタプルーニング手法の限界を示すとともに,LayerPruneフレームワークを提案する。 layerpruneは、同様の精度でフィルタプルーニング法よりも高いレイテンシー低減を達成する、異なる基準に基づく階層プルーニング法の集合を示す。遅延低減の観点からフィルタプルーニングよりもレイヤプルーニングの利点は、前者が元のモデルの深さに制約されず、より広範囲の遅延低減を可能にするという事実によるものである。各フィルタの刈り取り法について,同一のフィルタ重要度基準を用いて,単発で各層重要度スコアを算出した。次に, 最重要層のプルーニングを行い, フィルタベースのプルーニングに比べて, 同等あるいは優れた精度が得られる浅層モデルを微調整する。このワンショットプロセスでは、微調整前にvggのような単一パスネットワークからレイヤを削除できるが、反復フィルタのプルーニングとは異なり、検索空間を制約するデータフローを可能にするために、レイヤ毎のフィルタの最小数が必要となる。我々の知る限りでは、複数のネットワーク、データセット、ハードウェアターゲットに対するFLOPではなく、遅延メトリックに対するプルーニング手法の効果を最初に調べる。 layerprune は shufflenet, mobilenet, mnasnet, resnet18 といった手作りアーキテクチャを7.3%, 4.6%, 2.8%, 0.5% で上回っている。

論文の概要: To Filter Prune, or to Layer Prune, That Is The Question

関連論文リスト