Fugu-MT 論文翻訳(概要): Pruning by Active Attention Manipulation

論文の概要: Pruning by Active Attention Manipulation

arxiv url: http://arxiv.org/abs/2210.11114v1
Date: Thu, 20 Oct 2022 09:17:02 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-21 13:14:20.293034
Title: Pruning by Active Attention Manipulation
Title（参考訳）: アクティブアテンション操作によるプルーニング
Authors: Zahra Babaiee, Lucas Liebenwein, Ramin Hasani, Daniela Rus, Radu Grosu
Abstract要約: CNNのフィルタプルーニングは典型的には、CNNのフィルタ重みやアクティベーションマップに離散マスクを適用することで達成される。ここでは、アクティブアテンション操作(PAAM)によるプルーニング(pruning)という新しいフィルタ強調表示概念を提案する。 PAAMはフィルタ重みからアナログフィルタスコアを学習し、そのスコアの加算項によって正規化されたコスト関数を最適化する。
参考スコア（独自算出の注目度）: 49.61707925611295
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Filter pruning of a CNN is typically achieved by applying discrete masks on the CNN's filter weights or activation maps, post-training. Here, we present a new filter-importance-scoring concept named pruning by active attention manipulation (PAAM), that sparsifies the CNN's set of filters through a particular attention mechanism, during-training. PAAM learns analog filter scores from the filter weights by optimizing a cost function regularized by an additive term in the scores. As the filters are not independent, we use attention to dynamically learn their correlations. Moreover, by training the pruning scores of all layers simultaneously, PAAM can account for layer inter-dependencies, which is essential to finding a performant sparse sub-network. PAAM can also train and generate a pruned network from scratch in a straightforward, one-stage training process without requiring a pre-trained network. Finally, PAAM does not need layer-specific hyperparameters and pre-defined layer budgets, since it can implicitly determine the appropriate number of filters in each layer. Our experimental results on different network architectures suggest that PAAM outperforms state-of-the-art structured-pruning methods (SOTA). On CIFAR-10 dataset, without requiring a pre-trained baseline network, we obtain 1.02% and 1.19% accuracy gain and 52.3% and 54% parameters reduction, on ResNet56 and ResNet110, respectively. Similarly, on the ImageNet dataset, PAAM achieves 1.06% accuracy gain while pruning 51.1% of the parameters on ResNet50. For Cifar-10, this is better than the SOTA with a margin of 9.5% and 6.6%, respectively, and on ImageNet with a margin of 11%.
Abstract（参考訳）: CNNのフィルタプルーニングは典型的には、CNNのフィルタ重みやアクティベーションマップに離散マスクを適用することで達成される。本稿では,pruning by active attention manipulation(paam)という,cnnのフィルタセットを訓練中に特定の注意機構を通じてスパースする新しいフィルタ-importance-scoringコンセプトを提案する。 PAAMはフィルタ重みからアナログフィルタスコアを学習し、スコアの加算項によって正規化されたコスト関数を最適化する。フィルタは独立ではないので、注意を向けてその相関関係を動的に学習する。さらに、すべてのレイヤのプルーニングスコアを同時にトレーニングすることにより、PAAMは、パフォーマンスの低いサブネットワークを見つける上で必須の層間依存関係を説明できる。 paamはまた、事前訓練されたネットワークを必要とせずに、簡単なワンステージトレーニングプロセスで、スクラッチからprunedネットワークをトレーニングおよび生成することもできる。最後に、PAAMは各レイヤにおける適切なフィルタ数を暗黙的に決定できるため、レイヤ固有のハイパーパラメータや事前定義されたレイヤ予算を必要としない。異なるネットワークアーキテクチャにおける実験結果から,PAAMは最先端構造解析法(SOTA)より優れていることが示唆された。 CIFAR-10データセットでは、トレーニング済みのベースラインネットワークを必要とせず、それぞれ ResNet56 と ResNet110 で 1.02% と 1.19% の精度向上と52.3% と 54% のパラメータ削減が得られる。同様に、ImageNetデータセット上でPAAMは、ResNet50上のパラメータの51.1%をプルーニングしながら、1.06%の精度向上を達成した。 Cifar-10 では、それぞれ 9.5% と 6.6% のSOTA と 11% の ImageNet より優れている。

論文の概要: Pruning by Active Attention Manipulation

関連論文リスト