Automated neural network design has received ever-increasing attention with
the evolution of deep convolutional neural networks (CNNs), especially
involving their deployment on embedded and mobile platforms. One of the biggest
problems that neural architecture search (NAS) confronts is that a large number
of candidate neural architectures are required to train, using, for instance,
reinforcement learning and evolutionary optimisation algorithms, at a vast
computation cost. Even recent differentiable neural architecture search (DNAS)
samples a small number of candidate neural architectures based on the
probability distribution of learned architecture parameters to select the final
neural architecture. To address this computational complexity issue, we
introduce a novel \emph{architecture parameterisation} based on scaled sigmoid
function, and propose a general \emph{Differentiable Neural Architecture
Learning} (DNAL) method to optimize the neural architecture without the need to
evaluate candidate neural networks. Specifically, for stochastic supernets as
well as conventional CNNs, we build a new channel-wise module layer with the
architecture components controlled by a scaled sigmoid function. We train these
neural network models from scratch. The network optimization is decoupled into
the weight optimization and the architecture optimization. We address the
non-convex optimization problem of neural architecture by the continuous scaled
sigmoid method with convergence guarantees. Extensive experiments demonstrate
our DNAL method delivers superior performance in terms of neural architecture
search cost. The optimal networks learned by DNAL surpass those produced by the
state-of-the-art methods on the benchmark CIFAR-10 and ImageNet-1K dataset in
accuracy, model size and computational complexity.
‡Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, UK
英国ギルドフォードGU2 7XH、サリー大学ビジョン、音声、信号処理センター
0.59
China Abstract—Automated neural network design has received everincreasing attention with the evolution of deep convolutional neural networks (CNNs), especially involving their deployment on embedded and mobile platforms.
One of the biggest problems that neural architecture search (NAS) confronts is that a large number of candidate neural architectures are required to train, using, for instance, reinforcement learning and evolutionary optimisation algorithms, at a vast computation cost.
Even recent differentiable neural architecture search (DNAS) samples a small number of candidate neural architectures based on the probability distribution of learned architecture parameters to select the final neural architecture.
To address this computational complexity issue, we introduce a novel architecture parameterisation based on scaled sigmoid function, and propose a general Differentiable Neural Architecture Learning (DNAL) method to optimize the neural architecture without the need to evaluate candidate neural networks.
Specifically, for stochastic supernets as well as conventional CNNs, we build a new channel-wise module layer with the architecture components controlled by a scaled sigmoid function.
We train these neural network models from scratch.
これらのニューラルネットワークモデルをスクラッチからトレーニングします。
0.66
The network optimization is decoupled into the weight optimization and the architecture optimization, which avoids the interaction between the two types of parameters and alleviates the vanishing gradient problem.
Extensive experiments demonstrate our DNAL method delivers superior performance in terms of neural architecture search cost, and adapts to conventional CNNs (e g , VGG16 and ResNet50), lightweight CNNs (e g , MobileNetV2) and stochastic supernets (e g , ProxylessNAS).
The optimal networks learned by DNAL surpass those produced by the state-of-the-art methods on the benchmark CIFAR-10 and ImageNet-1K dataset in accuracy, model size and computational complexity.
Index Terms - Deep Neural Network, Convolutional Neural Network, Neural Architecture Search, Automated Machine Learning
0.94
I. INTRODUCTION Although convolutional neural networks have made great progress in various computer vision tasks, such as image classification [1]–[4], object detection [5]–[7] and semantic segmentation [8]–[11], their deployment into many embedded applications, including robotics, self-driving cars, mobile apps Manuscript received June 28, 2019
These approaches can be divided into three categories: conventional model compression [12]–[14], lightweight network design [15]–[17] and automatic neural architecture search [18]–[20].
Thanks to the over-parameterisatio n of deep neural networks, the conventional methods compress neural network models by different compression techniques, such as pruning [21], [22], network quantization [23], [24], tensor factorization [25], [26], and knowledge distilling [27].
The lightweight network is heuristically constructed by designing efficient modules, including group convolutions, depthwise separable convolutions, shuffle operations, etc.
Recently, in order to automatically explore the large design space, the NAS methods leverage reinforcement learning [18], [28], [29], evolutionary optimisation algorithm [19], [30], [31] and gradient-based method [20], [32], [33] for efficient neural network search, achieving the state-of-the-art recognition performance.
However, the space is so large that such hand-crafted methods cannot afford the architecture search cost.
しかし,このような手作りの手法では,建築検索のコストがかからないほど広範である。
0.70
Due to the limitations imposed on the search space, the resulting neural networks are usually sub-optimal.
探索空間に課される制限のため、結果として得られるニューラルネットワークは通常、準最適である。
0.67
Moreover, these methods have to take the constraint of hardware resources into account.
さらに、これらの手法はハードウェアリソースの制約を考慮する必要がある。
0.80
Unfortunately, the computational complexity makes it prohibitive to produce application and hardware specific models.
残念ながら、計算の複雑さはアプリケーションとハードウェア固有のモデルを作成することを禁じている。
0.68
(2) Previous NAS methods exploit reinforcement learning and evolutionary optimisation algorithms to automatically explore the discrete search space, thus achieving the state-of-the-art recognition performance.
However, such methods generate a large number of candidate neural architectures, more than 20,000 candidate neural networks across 500GPUs over 4 days in [34].
It is time-consuming to train and evaluate them so as to guide the neural net architecture search.
ニューラルネットワークアーキテクチャの探索をガイドするために、トレーニングと評価には時間がかかる。
0.63
(3) The existing DNAS methods relax the problem to search discrete neural architectures to optimize the probability of stochastic supernets, and allow us to explore continuous search spaces by using gradient-based methods.
MANUSCRIPT 2 require a few candidate neural architectures to identify the best candidate by sampling based on the probability distribution of learned architecture parameters [20].
To address these problems, we introduce a novel approach which converts the discrete optimisation problem into a continuous one.
これらの問題に対処するために,離散最適化問題を連続問題に変換する新しい手法を提案する。
0.75
This is achieved by proposing a differentiable neural architecture learning method to automatically search for the optimal neural network parameterised in terms of a continuous scaled sigmoid function.
Specifically, for both conventional CNNs and stochastic supernets, we build a new channel-wise module layer controlled by the scaled sigmoid function, which can be inserted into any existing neural architectures without any special design.
This module relaxes the discrete space of neural architecture search by continuous architecture representation.
このモジュールは、連続的なアーキテクチャ表現によるニューラルネットワーク探索の離散空間を緩和する。
0.67
By progressively reducing the smoothness of the scaled sigmoid function, the continuous optimization problem is gradually turned into the original architecture optimization problem.
No additional candidate neural networks are produced, significantly improving the efficiency of neural architecture search.
追加の候補ニューラルネットワークは生成されず、ニューラルネットワーク探索の効率を大幅に改善する。
0.64
In order to avoid the interaction between the weight optimization and the architecture optimization, the network optimization is decoupled into the weight optimization and the architecture optimization.
This also alleviates the vanishing gradient problem.
これはまた消える勾配問題を緩和します。
0.66
After optimizing the neural architecture, we achieve its potential representation ability by finetuning.
ニューラルアーキテクチャを最適化した後、微調整によりその潜在的な表現能力を実現する。
0.52
Extensive experiments demonstrate that our DNAL method is applicable to conventional CNNs (e g , VGG16 [2] and ResNet50 [3]), lightweight CNNs (e g , MobileNetV2 [35]) and stochastic supernets (e g , ProxylessNAS [36]), and achieves the state-of-the-art performance on the classification task on CIFAR-10 [37] and ImageNet-1K [38] in terms of model size, FLOPs, accuracy, and more importantly, search cost.
• We build a new standalone control module based on the scaled sigmoid function to enrich the neural network module family to enable the neural architecture optimization.
• We relax the discrete architecture optimization problem into a continuous one and learn the optimal neural architecture by using gradient-based methods.
• Extensive experiments confirm that our DNAL method achieves the state-of-the-art performance on various CNN architectures, including VGG16, ResNet50, MobileNetV2, and ProxylessNAS, over the task of CIFAR-10 and ImageNet-1K classification.
The rest of this paper is organized as follows: We first investigate the related work in Section II.
この論文の残りは、次のように整理されている:我々は最初にセクションIIで関連作業を調査します。
0.67
We then present the differentiable neural architecture learning method in Section III.
そして私たちは セクションIIIの微分可能な神経アーキテクチャ学習方法。
0.67
Subsequently, we demonstrate that our proposed DNAL method delivers superior performance through extensive experiments on various popular network models and datasets in Section V. We present an ablation study, which enhances the understanding of DNAL in Section VI.
Pruning is one of the most promising compression methods, which removes the redundant parts, including weights, channels and even layers, to compress neural networks based on heuristic rules.
It is orthogonal to other methods to design more efficient neural networks.
より効率的なニューラルネットワークを設計する他の方法と直交する。
0.74
As the structured pruning is more efficient in reducing the parameters and computations, the channel-wise pruning methods have attracted more attention.
Huang et al introduced a scaling factor to scale the output of specific structures, such as channels, groups, and residual blocks, and added sparsity regularizations on the scaling parameters to force them to zero [12].
Similarly, the scaling parameters in Batch Normalization (BN) are used to control the output of the corresponding channels without introducing any extra parameters [13].
Unfortunately, these methods achieve only sub-optimal solutions, because it is prohibitive to explore the whole search space using their human-based heuristics.
This kind of method aims to directly construct efficient neural networks by designing cheap but effective modules, rather than pruning redundant structures.
In practice, we usually reuse the neural networks for different devices to save the computation time, and adjust them to achieve a trade-off between accuracy and efficiency.
Recently, the methodology of neural architecture search has made a significant progress.
近年,ニューラルアーキテクチャ探索の方法論が大きな進歩を遂げている。
0.72
The
あらすじ
0.35
英語(論文から抽出)
日本語訳
スコア
MANUSCRIPT 3 NAS methods automatically explore the search space to find the optimal neural architectures by different optimization methods, such as reinforcement learning, evolutionary optimization algorithms and gradient-based methods.
The early works automatically searched the optimal neural architecture based on reinforcement learning [18], [28], [29] and evolutionary optimization algorithms [19], [30], [31] in a discrete search space.
These methods generate thousands of candidate neural network architectures, and their validation set performance is treated as the reward or fitness to guide the search process.
However, it is time-consuming to train and evaluate those candidate architectures.
しかし、これらの候補アーキテクチャのトレーニングと評価には時間がかかります。
0.60
Various proxy techniques have been adopted to reduce the search cost, including the performance evaluation on a small dataset, training for few epochs and searching few blocks [36].
However, they do not fundamentally solve the problem of search cost.
しかし,検索コストの問題は根本的な解決には至っていない。
0.74
To solve the problem effectively, the idea of differentiable neural architecture search was proposed in [20], [32], [33] to optimize the network architecture by gradient-based methods.
These DNAS methods utilize the softmax function over parallel operation blocks to convert the discrete space into a continuous space, and formulate the neural architecture learning in a differentiable manner.
The optimal neural architecture problem is then solved based on gradient search methods, which avoids enumerating individual network architectures and training/evaluating them separately.
Nevertheless, some DNAS methods still sample multiple candidate architectures based on the learned distribution of architecture parameters, thus resulting in extra search costs [20].
We will discuss the difference between our DNAL method and the existing DNAS methods in the supplementary material.
補足材料におけるdnal法と既存のdnas法の違いについて検討する。
0.48
III. METHODOLOGY min s∈Rn
III。 方法論 min s∈Rn
0.63
L(a(s), wa)
L(a(s), wa)
0.76
A. Problem Definition A neural network can be parameterized with two kinds of parameters, i.e., the architecture parameters, which represent the neural architecture, and the weights to generate the feature maps.
A is a discrete space of the architecture parameters s. We aim to find an optimal architecture a(s) ∈ A by optimizing s. The neural network with the optimal architecture a(s) is trained to achieve the minimal loss L(a(s), wa) by optimizing the weights wa.
A は、アーキテクチャパラメータ s の離散空間である。我々は、s を最適化することによって、最適アーキテクチャ a(s) ∈ A を見つけることを目指している。最適アーキテクチャ a(s) を持つニューラルネットワークは、重み wa を最適化することによって、最小損失 L(a(s), wa) を達成するために訓練される。
0.65
B. The Search Space In this paper, we build a channel-wise search space, which includes variants of conventional CNN, lightweight CNN and stochastic supernet as instances.
By choosing different channels we are able to configure different architectures.
異なるチャネルを選択することで、異なるアーキテクチャを構成できるのです。
0.72
The stochastic supernet have multiple parallel blocks performing different operations at each
確率スーパーネットは複数の並列ブロックを持ち、それぞれ異なる演算を行う
0.66
layer, providing a greater flexibility in the choice of architecture, but creating a larger search space.
レイヤ – アーキテクチャの選択において柔軟性が向上すると同時に,検索スペースも拡大する。
0.74
The existing DNAS methods choose only one different block from multiple candidate blocks at each layer to construct a layer-wise search space [20], [32], [33].
It diversifies the structure of neural networks, which helps to improve their representation capacity [44].
ニューラルネットワークの構造を多様化し、表現能力を向上させるのに役立ちます[44]。
0.81
The conventional CNN can be viewed as a special stochastic supernet which has a single operation block at each layer.
従来のCNNは、各層に1つの操作ブロックを持つ特別な確率的スーパーネットと見なすことができる。
0.72
Taking a stochastic supernet as an example, suppose that an L-layer stochastic supernet N contains M l parallel operation blocks at the l-th layer and each block has N l channels.
The state of each channel is a binary sample space, i.e., {0, 1}.
各チャネルの状態は二分標本空間、すなわち {0, 1} である。
0.70
The zero value means the corresponding channel does not contribute (cid:80)L to the process of inference and vice versa.
ゼロ値とは、対応するチャネルが推論のプロセスに(cid:80)lを寄与しないことを意味する。
0.77
The architecture l=1 M lN l possible architectures.
アーキテクチャ l=1 M lN l 可能なアーキテクチャ。
0.67
search space will include 2 For instance for the case of VGG16, as the total number of its channels is 4224, it contains 24224 ≈ 101272 possible architectures.
Here, s = [s1, ..., sL], and for the l-th layer, sl = M lN l ].
ここで s = [s1, ..., sL] であり、l-th 層に対して sl = M lN l である。
0.83
DNAL learns an efficient neural [sl 11, ..., sl 1N l , ..., sl architecture by converting the architecture parameters s to a binary vector b.
DNALは、アーキテクチャパラメータsをバイナリベクトルbに変換することにより、効率的なニューラル[sl 11, ..., sl 1N l , ..., slアーキテクチャを学ぶ。
0.80
The binarization process can be implemented (cid:40) by taking the binary function b = binary(s) as activation functions, 1, 0,
バイナリ化プロセスは、バイナリ関数 b = binary(s) をアクティベーション関数 1, 0 として取り込むことで実装できる(cid:40)。
0.84
if s>0 otherwise, b = binary(s) =
s>0 でなければ b = binary(s) =
0.77
(2) where s ∈ s and b ∈ b.
(2) s ∈ s と b ∈ b である。
0.85
However, it is infeasible to train the deep neural network with the standard back-propagation (e g , SGD), as the binary function is non-smooth and non-convex.
The binary function is ill-posed at zero, which is non-differentiable, and its gradient is zero for all nonzero values, which causes the vanishing gradient problem in the neural network optimization.
Motivated by the continuation methods [45], we convert the optimization problem with illposed gradients to a manageable problem by smoothing the binary activation function.
We find that there is a relationship between the binary function and the scaled sigmoid function which becomes binary when scale factor δ tends to infinity, as follows, (3) where p = sigmoid(δs) = 1/(1 + e−δs) is the scaled sigmoid function with hyper-parameter δ to control its transition from
Green, blue and red curves show the function p = sigmoid(δs) with hyper-parameter δ1<δ2<δ3.
緑、青、赤の曲線は超パラメータ δ1<δ2<δ3 を持つ関数 p = sigmoid(δs) を示す。
0.77
The key property is limδ→∞ sigmoid(δs) = binary(s).
鍵となる性質は limδ→∞ sigmoid(δs) = binary(s) である。
0.74
zero to one, as shown in Fig 1.
図1に示すように、0対1です。
0.75
If δ = 1, then it is the standard sigmoid function, which is smooth.
δ = 1 であれば、標準の sigmoid 関数であり、これは滑らかである。
0.89
The transition region becomes sharper as the scale factor δ increases.
スケールファクタδが増加すると、遷移領域が鋭くなる。
0.78
As δ approaches +∞, the function is transformed into the original non-smoothed binary function.
δ が +∞ に近づくと、関数は元の非スムース双対関数に変換される。
0.77
Thanks to the key property of the scaled sigmoid function, we relax the problem of optimizing the neural architecture to the problem of optimizing the architecture parameters s by progressively sharpening the scaled sigmoid function transition region.
Specifically, we begin to optimize the neural architecture with the smoothed sigmoid activation function, where δ0 = 1.
具体的には, δ0 = 1 の平滑化シグモイド活性化関数を用いて神経構造を最適化し始める。
0.76
By progressively increasing the scale factor δ, the neural architecture will gradually converge to a solution corresponding to an optimal architecture defined by the resulting binary function.
In this paper, max(δ) = 104, which is sufficient to guarantee the convergence.
本稿では、最大(δ) = 104 であり、収束を保証するのに十分である。
0.82
Therefore, our DNAL method can optimize the neural architecture by using gradient-based methods, while producing no additional candidate architectures.
We build a new channel-wise module layer to incorporate the Scaled Sigmoid activation function, named by SS, and add the SS transformation after the batch normalization layer, as shown in Fig 2.
我々は,ssによって命名されるスケールドシグモイド活性化関数を組み込んだ新しいチャネルワイズモジュール層を構築し,fig 2 に示すようにバッチ正規化レイヤの後に ss 変換を追加する。
0.83
The order of SS layer is analyzed in the ablation study presented later in the paper.
SS層の順は後日論文で示されたアブレーション研究で分析される。
0.67
To achieve an efficient neural architecture, we define the following loss function, L = L0 + λa
効率的な神経アーキテクチャを実現するために、次の損失関数 L = L0 + λa を定義する。
0.75
sigmoid(δsl ij).
sigmoid (δsl ij)。
0.85
(4) L(cid:88) l=1
(4) L(cid:88) l=1
0.78
M l(cid:88) i=1
M l(cid:88) i=1
0.75
N l(cid:88) j=1
N l(cid:88) j=1
0.75
Here, the first term is the cross-entropy loss function.
ここで、最初の項はクロスエントロピー損失関数である。
0.73
The second term drives the scaled sigmoid activation of each channel to zero, which tends to remove the less important channels.
The hyper-parameter λa is a coefficient to achieve an appropriate balance between accuracy-efficiency.
ハイパーパラメータλaは精度と効率の適切なバランスを達成する係数である。
0.77
In the forward step, we calculate the output ˆxl of the l-th sigmoid(δsl xl ), (5) layer as follows, M l(cid:88) i1) i1 ... ... ˆxl = ReLU ( sigmoid(δsl xl i=1 iN l ) iN l where (cid:12) is the symbol of Hadamard product, and xl ij is the original output of the j-th channel of the i-th block in the l-th layer.
前段では、第5次シグモイド(δsl ) (5) 層の出力を次のように計算する: M l(cid:88) i1) i1 ... ... .xl = ReLU ( sigmoid(δsl xl i=1 iN l ) iN l ここで (cid:12) はアダマール積のシンボルであり、xl ij は第2次ブロックの j-th チャネルの元々の出力である。
0.85
The sigmoid function serves as a weight coefficient to scale the output of the corresponding channel.
シグモイド関数は、対応するチャネルの出力をスケーリングする重み係数として機能する。
0.80
The outputs of the blocks at each layer are aggregated as the layer output.
各層におけるブロックの出力は、層出力として集約される。
0.86
The scale factor δ exponentially increases as the training process progresses.
スケールファクタδは、トレーニングプロセスが進むにつれて指数関数的に増加する。
0.62
(cid:12)
は (cid:12)
0.75
In the backward step, the gradient w.r.t.
後方ステップでは勾配w.r.t。
0.67
the architecture parameters can be calculated as follows, δl2
アーキテクチャパラメータは次のように計算できる。 δl2
0.80
ij(1 − δlsl sl ij).
ij(1 − δlsl sl ij)。
0.85
(6) = = ∂L ∂pl ij
(6) = = ∂l ∂pl ij
0.82
∂L ∂sl ij ∂L ∂pl ij
∂l ∂sl ij ∂l ∂pl ij
0.75
∂pl ij ∂sl ij
∂pl ij ∂sl ij
0.84
Our approach has a number of significant advantages.
私たちのアプローチには多くの大きな利点があります。
0.62
We update the architecture parameters s by directly using a gradient descent method.
勾配降下法を用いて、アーキテクチャパラメータsを直接更新する。
0.62
Due to the relative small number of architecture parameters, compared to the number of weights, our DNAL method exhibits a fast convergence for the architecture optimization.
In this work, the architecture optimization consumes about one-tenth of the resource needed for the parameter optimization, specifically 20 epochs for CIFAR-10 and 10 epochs for ImageNet, as described in Section V in detail.
On the contrary, if sigmoid(δsl ij) = 0, the corresponding channel is decisively pruned out as its output has no contribution to the subsequent computation.
However, as this pruning process can result in channel dimension incompatibility, we follow the technique in [33], and zeropad the missing channels for the sake of channel dimension alignment.
Moreover, after pruning, we disable all the SS layer, so no special operations or structures are introduced.
さらに,プルーニング後にすべてのSS層を無効にするため,特別な操作や構造は導入されない。
0.74
Finally, we finetune the optimal sub-network to restore its representative ability.
最後に、その代表能力を回復するために最適なサブネットワークを微調整する。
0.57
Sequential Optimization Strategy.
シーケンス最適化戦略。
0.78
In the approach described so far, there are two challenging problems.
これまでのアプローチでは、難しい問題が2つあります。
0.72
With the increasing value of the hyper-parameter δ, the scaled sigmoid function has larger and larger saturation zone where the gradient is zero and this leads to the vanishing gradient problem.
We found empirically that if the architectures are optimized from the start without a suitable initialisation, the architecture search will tend to fall into bad local optima.
At the weight optimization stage, we disable the SS layer, which means that the SS layer does not change the original channel output, i.e., sigmoid(δs) = 1 for each channel, and we use SGD to learn only the weights.
When optimizing the neural architecture, we freeze the trainable layers, including convolutional layers and BN layers, and focus only on the architecture parameters also by SGD.
Require: The stochastic supernet N with SS layers, a sequence 1 = δ1<··· <δn = +∞.
必須: SS層を有する確率的スーパーネットN、列1 = δ1<·· <δn = +∞。
0.80
Ensure: the efficient neural network ˆN.
確実性: 効率的なニューラルネットワーク。
0.72
1: randomly initializing the weights W , and disabling the SS layers 2: for each epoch i = 1 to m do optimizing W by SGD with respect to L = L0 + λw (cid:107)W(cid:107)2 3: 4: end for 5: enabling the SS layers, initializing s = 0, and freezing the weights W (cid:80)(cid:80)(cid :80) sigmoid(δis) 6: for each epoch i = 1 to n do to L = L0 + optimizing S by SGD with respect 7: λa 8: end for 9: pruning the channels with sigmoid(δs) = 0 (cid:13)(cid:13)(cid :13)2 (cid:13)(cid:13)(cid :13) ˆW 10: disabling the SS layers 11: finetuning the searched network ˆN by SGD with respect to L = L0 + λw In this section, we investigate the differences between DNAL and other related methods by comparison.
1: randomly initializing the weights W , and disabling the SS layers 2: for each epoch i = 1 to m do optimizing W by SGD with respect to L = L0 + λw (cid:107)W(cid:107)2 3: 4: end for 5: enabling the SS layers, initializing s = 0, and freezing the weights W (cid:80)(cid:80)(cid :80) sigmoid(δis) 6: for each epoch i = 1 to n do to L = L0 + optimizing S by SGD with respect 7: λa 8: end for 9: pruning the channels with sigmoid(δs) = 0 (cid:13)(cid:13)(cid :13)2 (cid:13)(cid:13)(cid :13) ˆW 10: disabling the SS layers 11: finetuning the searched network ˆN by SGD with respect to L = L0 + λw In this section, we investigate the differences between DNAL and other related methods by comparison.
1.00
Comparison with other DNAS methods.
他のdna法との比較。
0.64
Although our DNAL method adopts the same gradient-based approach as the existing DNAS methods to optimize the neural architecture, there are major differences between them in following three respects.
IV. ANALYSIS block, and then retain the block with the maximum probability to construct the optimal architecture, while abandoning the other components [20], [32], [33].
DNAL utilizes the scaled sigmoid function to learn the absolute probability of each channel.
DNALはスケールドシグモイド関数を利用して各チャネルの絶対確率を学習する。
0.81
After converging to the original binary problem, we preserve the channels with probability 1, while removing the channels with probability 0.
元のバイナリ問題に収束した後、我々は確率0のチャネルを削除しながら、確率1のチャネルを保存します。
0.71
Second, the existing DNAS methods choose the most likely operation block from multiple candidate operation blocks for each layer to construct the optimal neural architecture, which means each layer contains only one operation block [20], [32], [33].
Thus, DNAL increases the search space size by orders of magnitude.
したがって、DNALは検索空間のサイズを桁違いに増加させる。
0.68
This helps to improves the accuracy of neural networks, which is experimentally confirmed in the following section.
これにより、以下のセクションで実験的に確認されたニューラルネットワークの精度が向上する。
0.76
Third, after finishing the architecture search, some DNAS methods still sample several candidate architectures based on the distribution of architecture parameters, and select the best one by training them from scratch [20].
Our DNAL method yields directly the optimal architecture by the proposed method, producing no additional candidate architectures.
提案手法により,DNAL法は最適アーキテクチャを直接生成し,追加の候補アーキテクチャは生成しない。
0.72
This significantly reduces the computational cost.
これにより計算コストが大幅に削減される。
0.61
Comparison with other scaling methods.
他のスケーリング方法との比較。
0.70
In DNAL, we introduce the scaled sigmoid function as a mechanism to scale the output of each channel.
DNALでは、各チャネルの出力をスケールするメカニズムとしてスケールドシグモイド関数を紹介します。
0.77
The proposed method is significantly different from other scaling methods.
提案手法は他のスケーリング手法とは大きく異なる。
0.67
First, the existing scaling methods consider the scale factor as a coefficient to scale the output of some specific structures [12], including channels, groups and blocks.
We define an architecture parameter s ∈ R, and use its scaled sigmoid function sigmoid(δs) ∈ [0, 1], as the scale factor in a probabilistic way.
アーキテクチャパラメータ s ∈ R を定義し、そのスケールしたシグモノイド函数 sigmoid(δs) ∈ [0, 1] を確率的方法のスケール因子として用いる。
0.78
Second, these scaling methods impose a sparsity constraint on the scaling parameters to push them infinitely close to zero, and then prune the structures corresponding to zero or near-zero out.
However, such pruning may degrade the performance.
しかし、このような刈り取りは性能を低下させる可能性がある。
0.46
By contrast, our DNAL method forces the sparsity constraint on the scaled sigmoid function rather than the architecture parameters, driving them into the negative saturation zone, i.e., sigmoid(δs) = 0.
V. EXPERIMENTS In this section, we empirically evaluate the proposed DNAL method on CIFAR-10 [37] and ImageNet-1K [38] for classification by using state-of-the-art CNN architectures, which include conventional CNNs (e g , VGG [2] and ResNet [3]), lightweight CNNs (e g , MobileNetV2 [35]) and stochastic supernets (e g , ProxylessNAS [36]).
B. Classification on CIFAR-10 We evaluate the recognition performance on CIFAR-10, comparing against several popular convolutional neural networks, such as VGG16, ResNet56 and MobileNetV2.
B. CIFAR-10の分類 CIFAR-10 の認識性能を評価し,VGG16,ResNet56,Mobi leNetV2 などの一般的な畳み込みニューラルネットワークと比較した。
0.78
Implementation. We use a variation of VGG16, as in [47].
実装。 私たちは[47]のようにvgg16のバリエーションを使います。
0.63
In the first weight optimization stage, the initial model is trained for 100 epochs, and the learning rate is fixed to 0.1.
第1重み最適化段階では、初期モデルは100エポックで訓練され、学習率は0.1に固定される。
0.79
In the architecture optimization stage, we learn the optimal neural architecture for 20 epochs with a constant learning rate 0.1.
Compared with HRank, DNAL is significantly better in all respects (61.23 vs. 73.70 in FLOPs, 0.60 vs. 1.78 in Params and 92.33% vs. 91.23% in top-1 accuracy).
HRankと比較すると、DNALはあらゆる点で著しく優れている(FLOPでは61.23対73.70、パラムでは0.60対1.78、トップ1の精度では92.33%対91.23%)。 訳抜け防止モード: HRankと比較すると、DNALはFLOPsでは61.23対73.70である。 0.60 vs. 1.78 in Params and 92.33 % vs. 91.23 % in top-1 accuracy )。
0.64
In addition, we tried for a higher compression and acceleration rate of up to about 90× and 20×, respectively, for mirconet, achieving 89.27% top-1 accuracy and 99.51% top-5 accuracy.
With a similar model size and computation complexity, we achieve better accuracy than NISP (93.75% vs. 93.01%).
同様のモデルサイズと計算の複雑さにより、NISP(93.75%対93.01%)よりも精度が高い。
0.74
DNAL yields 1.3% higher top-1 accuracy and about 2× faster speedup than AMC, and also yields 0.32% and 1.62% higher top-1 accuracy than KSE and GAL-0.8 with a smaller model size and faster speedup, respectively.
To explore more efficient neural models, we further compress the neural network, up to more than 70× for model size and more than 120× for computation complexity, achieving 83.48% and 99.19% in top-1 and top-5 accuracy, respectively.
Although it is much harder to further compress the lightweight model, DNAL still manages to obtain 87.85% top-1 accuracy and 99.62% top-5 accuracy with an acceleration rate of about 20× and compression rate of roughly 30×.
We finetune the derived neural networks for 110 epochs by SGD with a mini-batch size of 96.
我々は,sgdによる110エポックのニューラルネットワークを,ミニバッチサイズ96で細粒化する。
0.69
The other hyper-parameters are the same to MobileNetV2.
他のハイパーパラメータはMobileNetV2と同じです。
0.77
VGG16. Tab.
VGG16。 Tab。
0.82
IV shows the performance of different methods.
IVは、異なるメソッドのパフォーマンスを示す。
0.77
Compared with GDP, our DNAL method achieves a faster acceleration rate (3.23× vs. 2.42×) and 1% higher top-1 accuracy (69.80% vs. 68.80%).
GDPと比較すると,DNAL法は加速速度(3.23× vs. 2.42×)が速く,トップ1の精度(69.80% vs. 68.80%)は1%高い。
0.64
Compared with both ThiNet and SSR, DNAL provides significantly better parameter reduction (77.05 vs. 131.44 and 77.05 vs. 126.7), while maintaining a comparable performance in top-1 accuracy.
ThiNetとSSRの両方と比較して、DNALはパラメータ低減(77.05 vs. 131.44および77.05 vs. 126.7)を有意に改善する一方で、トップ1の精度で同等の性能を維持している。 訳抜け防止モード: ThiNet と SSR と比較して、DNAL はパラメータ削減を著しく改善する(77.05 対 131.44 対 77.05 対 126.7 )。 トップ1の精度で同等のパフォーマンスを維持しながら。
0.66
ResNet50. For ResNet50, we summarize the performance comparison with various methods in Tab.
V. We observe that DNAL outperforms SSR by a significant margin in all respects.
V. DNALはすべての点でSSRより優れていることが観察された。
0.58
Similarly, DNAL achieves better performance than both GDP and GAL.
同様に、DNALはGDPとGALの両方よりも優れたパフォーマンスを達成する。
0.54
Compared with ThiNet-50, DNAL achieves 1.62% higher top-1 accuracy with similar FLOPs and parameter reductions (1.75 vs. 1.71 in FLOPs and 12.75 vs. 12.38 in
ThiNet-50と比較して、DNALは類似のFLOPで1.62%高いトップ-1精度を達成し、パラメータ低減(1.75対1.71 in FLOP)と12.75対12.38 in
VI. Here we compare DNAL with the state-of-theart autoML model compression method, i.e.
VI。 ここではDNALを最先端のAutoMLモデル圧縮法と比較する。
0.71
AMC. DNAL outperforms AMC by more than 0.2% with approximate FLOPs (217.24 vs. 211), and even beats it by 0.11% at smaller computation complexity (207.25 vs. 211).
Its ability to compress the lightweight neural networks further is surprising.
軽量ニューラルネットワークをさらに圧縮する能力は驚くべきことです。
0.79
ProxylessNAS.
ProxylessNAS。
0.80
We show the compression performance of different NAS methods in Tab.
Tab で異なる NAS メソッドの圧縮性能を示します。
0.75
VII. These searched models are divided into three categories according to the underlying NAS methods used.
VII。 これらの探索モデルは、基礎となるNAS法に従って3つのカテゴリに分けられる。
0.70
Our DNAL achieves 75.0% top-1 and 92.5% top5 accuracy on ImageNet with only 3.6M parameters, which is a new state-of-the-art accuracy among different NAS methods.
Compared with the EA-based NAS methods, DNAL is more than 2% and about 1% higher than CARS in top-1 and top-5 accuracy, respectively, with slightly fewer parameters.
Compared to the based-RL NAS methods, DNAL model attains a significantly better performance than both NASNet and MnasNet, while requiring fewer parameters.
Our DNAL also surpasses DARTS by 1.7% and 1.2% in top1 and top-5 accuracy, respectively, with significantly fewer parameters, and achieves almost the same top-1 accuracy as ProxylessNAS, but with 2× fewer parameters.
Efficiency. In this part, we further analyze the efficiency of the proposed DNAL method.
効率性。 本研究では,提案するDNAL法の効率を解析する。
0.68
To demonstrate its efficiency, we choose the number of training epochs, which is hardwareindependent, as a metric of learning efficiency, for fair comparison.
For the other two neural networks, i.e., MobileNetV2 and ProxylessNAS, our gradient-based DNAL method is empirically more efficient than the NAS methods based on reinforcement learning (e g , AMC) and evolutionary optimisation algorithm (e g , CARS), which in any case are much expensive in terms of the search cost due to the evaluation of lots of candidate neural architectures.
In this section, we report the results of an ablation study set up to investigate the impact of different factors on both VGG16 and MobileNetV2 using the classification task of CIFAR-10 as a vehicle.
Fig 3 shows the distribution of the scaled sigmoid activations obtained with different scale factors.
図3は、異なるスケール因子で得られたスケールしたシグモイドの活性化の分布を示す。
0.74
We initialize the scale factor to δ = 1 and the architecture parameter to s = 0 at the beginning of the architecture optimization step, i.e., sigmoid(δs) = 0.5.
We can see that the many scaled sigmoid activations are induced to zero under the influence of the scaled sigmoid regularization as the scale factor increases.
The number of channels with sigmoid(δs) = 1 defines the efficiency of the resulting neural networks.
sigmoid(δs) = 1 のチャネル数は、得られたニューラルネットワークの効率を定義します。
0.80
Effect of the order of SS layer.
SS層の順序の影響。
0.59
We explore the effect of different placement of the SS layer in conjunction with three network configurations, i.e., Conv-SS-BN-ReLU, Conv-BN-SSReLU and Conv-BN-ReLU-SS configurations.
Fig 4 shows the test accuracy achieved with different network configurations.
図4は、異なるネットワーク構成で達成されたテスト精度を示している。
0.68
We observe that both the Conv-BN-SS-ReLU and ConvBN-ReLU-SS configurations are close in the recognition performance, which is significantly better than Conv-SS-BNReLU.
In the weight optimization stage, these three networks are identical because the SS layer is disabled.
重量最適化段階では、SS層が無効であるため、これら3つのネットワークは同じです。
0.67
Thus, their behaviours are also consistent.
したがって、彼らの行動も一貫している。
0.46
In the architecture optimization stage, their network configurations become different due to the enabling SS layers.
アーキテクチャ最適化の段階では、SS層を有効にするため、ネットワーク構成が異なる。
0.77
For both the Conv-BN-SS-ReLU and Conv-BN-ReLU-SS configurations, the optimized architectures improve the accuracy at the beginning of the architecture optimization.
However, as the number of pruned channels increases, their performance gradually degrades.
しかし、刈り取られたチャンネルの数が増えるにつれて、その性能は徐々に低下する。
0.62
By contrast, the Conv-SS-BN-ReLU’s performance reduces dramatically.
対照的に、Conv-SS-BN-ReLUの性能は劇的に低下します。
0.50
After finetuning, all exhibit improved performance, but both the Conv-BN-SS-ReLU and Conv-BN-ReLU-SS configurations are significantly better than Conv-SS-BN-ReLU.
(b) Figure 4. Classification accuracy with different configurations (%).
(b) 図4。 異なる構成(%)の分類精度。
0.77
(a) VGG16 on CIFAR-10.
(a) CIFAR-10のVGG16。
0.76
(b) MobileNetV2 on CIFAR-10.
(b) CIFAR-10のMobileNetV2。
0.72
(a) (b) Figure 5.
(a) (b) 図5。
0.82
Classification accuracy with different optimization strategies but with but similar efficiency (%).
最適化戦略が異なるが、類似の効率(%)を持つ分類精度。
0.82
(a) VGG16 on CIFAR-10.
(a) CIFAR-10のVGG16。
0.76
(b) MobileNetV2 on CIFAR-10.
(b) CIFAR-10のMobileNetV2。
0.72
As we can see, setting the SS layer behind the BN layer helps to improve the recognition accuracy.
ご覧のとおり、BNレイヤーの後ろにSSレイヤーを設定すると、認識精度が向上します。
0.65
We argue that the BN layer normalizes the distribution of feature maps for each layer, which enables the SS layer to operate under the same distribution of feature maps.
Effect of the joint optimization. In this paper, we optimize the weights and architecture parameters in a sequential manner.
共同最適化の効果。 本稿では,重みと構造パラメータを逐次的に最適化する。
0.68
However, they can be optimized jointly.
しかし、これらは共同で最適化できます。
0.62
We compare these two optimization strategies in Fig 5.
この2つの最適化戦略を図5で比較する。
0.73
For a fair comparison, the searched models are similar in the computational efficiency.
公正な比較として、探索されたモデルは計算効率に類似している。
0.75
At the beginning of the network optimization, the joint strategy has a faster convergence, and exhibits better accuracy.
ネットワークの最適化の始めに、共同戦略はより速い収束があり、よりよい正確さを示します。
0.75
However, as the network continuous to be optimised, the performance gradually degrades in accuracy.
しかし、ネットワークが継続的に最適化されるにつれて、パフォーマンスは徐々に精度が低下します。
0.59
For the sequential strategy, after the architecture optimization, the resulting network rapidly recovers its performance by finetuning, surpassing the joint strategy optimization in accuracy.
VII. CONCLUSION We have presented a differentiable neural architecture learning method (DNAL).
VII。 結論 我々は、差別化可能な神経アーキテクチャ学習法(DNAL)を提示した。
0.63
DNAL utilizes the scaled sigmoid function to relax the discrete architecture space into a continuous architecture space, and gradually converts the continuous
optimization problem into the binary optimization problem.
二進最適化問題への最適化問題。
0.78
The optimal neural architecture is learned by gradient-based methods without the need to evaluation candidate architectures individually, thus significantly improving the search efficiency.
We introduced a new SS module layer to implement the scaled sigmoid activation function, enriching the module family of neural networks for the optimization of neural architectures.
The proposed DNAL method was applied to conventional CNNs, lightweight CNNs and stochastic supernets.
提案手法は,従来のCNN,軽量CNN,確率的スーパーネットに適用された。
0.59
Extensive experiments on CIFAR-10 and ImageNet-1K demonstrated that DNAL delivers state-of-the-art performance in terms of accuracy, model size and computational complexity, especially search cost.
ACKNOWLEDGMENT This work is supported by the National Key R&D Program of China (Grant No.
情報 この事業は中国国家鍵研究開発プログラム(Grant No.)が支援している。
0.52
2018YFB1004901), by the National Natural Science Foundation of China (Grant No.61672265, U1836218), by the 111 Project of Ministry of Education of China (Grant No.
2018YFB1004901, by the National Natural Science Foundation of China (Grant No.61672265, U1836218), by the 111 Project of Education of China (Grant No。
0.80
B12018), by UK EPSRC GRANT EP/N007743/1, MURI/EPSRC/DSTL GRANT EP/R018456/1.
B12018) by UK EPSRC GRANT EP/N007743/1, MURI/EPSRC/DSTL GRANT EP/R018456/1。
0.52
REFERENCES [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp.
参考 [1] a. krizhevsky, i. sutskever, g. e. hinton, “imagenet classification with deep convolutional neural networks” in advances in neural information processing systems, pp. pp. pp.
0.66
1097–1105.
1097–1105.
0.71
[2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[2] K. Simonyan, A. Zisserman, “Very deep convolutional network for large-scale image recognition” arXiv preprint arXiv:1409.1556, 2014
0.93
[3] K. He, X. Zhang, S. Ren, and J.
[3]K.He,X.Zhang,S. Ren,J.
0.79
Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp.
the ieee conference on computer vision and pattern recognition, 2016 pp. ^ “deep residual learning for image recognition” を参照。
0.62
770–778. [4] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp.
770–778. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional network” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017 pp. 2017年9月1日閲覧。
0.80
4700–4708.
4700–4708.
0.71
[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014 pp. pp。
0.80
580–587. [6] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp.
580–587. 6] r. girshick, “fast r-cnn” in proceedings of the ieee international conference on computer vision, 2015 pp。
0.70
1440–1448.
1440–1448.
0.71
[7] S. Ren, K. He, R. Girshick, and J.
[7]S. Ren, K. He, R. Girshick, J.
0.89
Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp.
sun, “faster r-cnn: towards real-time object detection with region proposal networks” in advances in neural information processing systems, 2015, pp. ^ (英語)
0.86
91–99. [8] J.
91–99. [8]j。
0.70
Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp.
Long, E. Shelhamer, T. Darrell, “Fully convolutional network for semantic segmentation” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015 pp。
0.77
3431–3440.
3431–3440.
0.71
[9] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention.
9] o. ronneberger, p. fischer, t. brox, “u-net: convolutional networks for biomedical image segmentation” in international conference on medical image computing and computer-assisted intervention”。
0.88
Springer, 2015, pp. 234–241.
2015年春、p。 234–241.
0.58
[10] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested u-net architecture for medical image segmentation,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support.
10] z. zhou, m. m. r. siddiquee, n. tajbakhsh, j. liang, “unet++: a nested u-net architecture for medical image segmentation” in deep learning in medical image analysis and multimodal learning for clinical decision support。
0.85
Springer, 2018, pp.
スプリンガー、2018年、p。
0.40
3–11. [11] K. He, G. Gkioxari, P. Doll´ar, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2017, pp.
3–11. K. He, G. Gkioxari, P. Doll ́ar, R. Girshick, “Mask r-cnn” in Proceedings of the IEEE International conference on computer vision, 2017 pp。
0.76
2961–2969.
2961–2969.
0.71
[12] Z. Huang and N. Wang, “Data-driven sparse structure selection for deep neural networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp.
Z. Huang and N. Wang, “Data-driven sparse structure selection for Deep Neural Network” in Proceedings of the European Conference on Computer Vision (ECCV) 2018, pp。
0.74
304–320. [13] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning efficient convolutional networks through network slimming,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp.
304–320. Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, C. Zhang, “Learning efficient convolutional network through network Slimming” in Proceedings of the IEEE International Conference on Computer Vision, 2017 pp。 訳抜け防止モード: 304–320. [13 ]Z.Liu,J.Li,Z.Shen, G. Huang, S. Yan, C. Zhang, ネットワークスリム化による効率的な畳み込みネットワークの学習”。 In Proceedings of the IEEE International Conference on Computer Vision, 2017, pp。
0.80
2736– 2744.
2736– 2744.
0.94
[14] C. Zhao, B. Ni, J. Zhang, Q. Zhao, W. Zhang, and Q. Tian, “Variational convolutional neural network pruning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.
C. Zhao, B. Ni, J. Zhang, Q. Zhao, W. Zhang, and Q. Tian, “Variational Convolutional Neural Network pruning” (IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 2019) に参加して
MANUSCRIPT 10 [15] G. Huang, S. Liu, L. Van der Maaten, and K. Q. Weinberger, “Condensenet: An efficient densenet using learned group convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp.
マニクリプト 10 15] G. Huang, S. Liu, L. Van der Maaten, K. Q. Weinberger, “Condensenet: an efficient densenet using learned group convolutions” は、IEEE Conference on Computer Vision and Pattern Recognition, 2018 pp. 2018で発表された。
0.72
2752–2761.
2752–2761.
0.71
[16] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
16] A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, “Mobilenets: Efficient Convolutional Neural Network for Mobile Vision Applications” arXiv preprint arXiv:1704.04861, 2017
0.95
[17] X. Zhang, X. Zhou, M. Lin, and J.
[17] X. Zhang, X. Zhou, M. Lin, J.
0.91
Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp.
Sun, “Shufflenet: a very efficient convolutional neural network for mobile devices” は、IEEE Conference on Computer Vision and Pattern Recognition, 2018 pp. のProceedingsに掲載されました。
0.84
6848–6856.
6848–6856.
0.71
[18] M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, “Mnasnet: Platform-aware neural architecture search for mobile,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.
18] M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, Q. V. Le, “Mnasnet: Platform-aware neural architecture search for mobile” は、IEEE Conference on Computer Vision and Pattern Recognition, 2019, ppのProceedingsにある。
0.94
2820–2828.
2820–2828.
0.71
[19] Z. Yang, Y. Wang, X. Chen, B. Shi, C. Xu, C. Xu, Q. Tian, and C. Xu, “Cars: Continuous evolution for efficient neural architecture search,” arXiv preprint arXiv:1909.04977, 2019.
19] Z. Yang, Y. Wang, X. Chen, B. Shi, C. Xu, C. Xu, Q. Tian, C. Xu, “Cars: Continuous Evolution for efficient neural architecture search” arXiv preprint arXiv:1909.04977, 2019。
0.96
[20] B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer, “Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.
20] B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer, “Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search” は、IEEE Conference on Computer Vision and Pattern Recognition, 2019, ppのProceedingsに載っている。
0.91
10 734–10 742.
10 734–10 742.
0.84
[21] S. Lin, R. Ji, Y. Li, C. Deng, and X. Li, “Towards compact convnets via structure-sparsity regularized filter pruning,” in IEEE Transactions on Neural Networks and Learning Systems, 2019.
IEEE Transactions on Neural Networks and Learning Systems, 2019.[21] S. Lin, R. Ji, Y. Li, C. Deng, X. Li, “Towards compact convnets via Structure-sparsity regularized filter pruning”. IEEE Transactions on Neural Networks and Learning Systems, 2019.
0.81
[22] Y. He, X. Zhang, and J.
[22] Y。 彼, X. Zhang, J。
0.74
Sun, “Channel pruning for accelerating very deep neural networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp.
Sun, “Channel pruning for accelerating very Deep Neural Network” in Proceedings of the IEEE International Conference on Computer Vision, 2017 pp.
0.76
1389–1397.
1389–1397.
0.71
[23] M. Courbariaux, Y. Bengio, and J.-P. David, “Binaryconnect: Training deep neural networks with binary weights during propagations,” in Advances in neural information processing systems, 2015, pp.
23] m. courbariaux, y. bengio, j.-p. david, “binaryconnect: training deep neural networks with binary weights during propagations” in advances in neural information processing systems, 2015, pp. (英語)
0.85
3123– 3131.
3123– 3131.
0.94
[24] F. Li, B. Zhang, and B. Liu, “Ternary weight networks,” arXiv preprint arXiv:1605.04711, 2016.
24] F. Li, B. Zhang, and B. Liu, "Ternary weight network" arXiv preprint arXiv:1605.04711, 2016
0.95
[25] B. Peng, W. Tan, Z. Li, S. Zhang, D. Xie, and S. Pu, “Extreme network compression via filter group approximation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp.
25] B. Peng, W. Tan, Z. Li, S. Zhang, D. Xie, and S. Pu, “Extreme network compression via filter group approximation” in Proceedings of the European Conference on Computer Vision (ECCV), 2018 pp。
0.84
300–316. [26] X. Yu, T. Liu, X. Wang, and D. Tao, “On compressing deep models by low rank and sparse decomposition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.
300–316. 926] X. Yu, T. Liu, X. Wang, D. Tao, “On compressing deep model by low rank and sparse decomposition” in Proceedings on IEEE Conference on Computer Vision and Pattern Recognition, 2017 pp。
0.72
7370–7379.
7370–7379.
0.71
[27] G. Hinton, O. Vinyals, and J.
27] G. Hinton, O. Vinyals, J.
0.78
Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
Dean, “Distilling the knowledge in a neural network” arXiv preprint arXiv:1503.02531, 2015
0.92
[28] B. Baker, O. Gupta, N. Naik, and R. Raskar, “Designing neural network architectures using reinforcement learning,” arXiv preprint arXiv:1611.02167, 2016.
B. Baker, O. Gupta, N. Naik, R. Raskar, “Designing neural network architectures using reinforcement learning, arXiv preprint arXiv:1611.02167, 2016”。
0.90
[29] I. Bello, B. Zoph, V. Vasudevan, and Q. V. Le, “Neural optimizer search with reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017, pp.
I. Bello, B. Zoph, V. Vasudevan, and Q. V. Le, "Neural optimizationr search with reinforcement Learning" in Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017 pp。
0.83
459–468. [30] E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin, “Large-scale evolution of image classifiers,” in Proceedings of the 34th International Conference on Machine LearningVolume 70, 2017, pp.
459–468. E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, A. Kurakin, “Large-scale evolution of image classifications” in Proceedings of the 34th International Conference on Machine LearningVolume 70, 2017 pp。
0.81
2902–2911.
2902–2911.
0.71
[31] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier architecture search,” in Proceedings of the aaai conference on artificial intelligence (AAAI), 2019, pp.
E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classification architecture search” in Proceedings of the aaai conference on Artificial Intelligence (AAAI, 2019, pp。 訳抜け防止モード: 31 ] E. Real, A. Aggarwal, Y. Huang, and Q.V.Le, “Regularized Evolution for image Classifier Architecture Search, ” in Proceedings of the aaai conference on artificial Intelligence (AAAI ), 2019, pp. で紹介されました。
0.90
4780–4789.
4780–4789.
0.71
[32] H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018.
32] H. Liu, K. Simonyan, Y. Yang, “Darts: Differentiable architecture search” arXiv preprint arXiv:1806.09055, 2018
0.90
[33] A. Wan, X. Dai, P. Zhang, Z.
33] A. Wan、X. Dai、P. Zhang、Z。
0.84
He, Y. Tian, S. Xie, B. Wu, M. Yu, T. Xu, K. Chen et al , “Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp.
He, Y. Tian, S. Xie, B. Wu, M. Yu, T. Xu, K. Chen et al , “Fbnetv2: differentiable neural architecture search for space and channel dimensions”, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp。
0.94
12 965–12 974.
12 965–12 974.
0.84
[34] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp.
B.Zoph, V. Vasudevan, J. Shlens, Q. V. Le, “Learning transferable architectures for scalable image recognition” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp。
0.79
8697–8710.
8697–8710.
0.71
[35] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation.
35] M. Sandler、A. Howard、M. Zhu、A. Zhmoginov、L. Chenは次のように述べています。 訳抜け防止モード: 35 ] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “Inverted Residials and linear bottlenecks : 分類、検出、セグメンテーションのためのモバイルネットワーク。
[36] H. Cai, L. Zhu, and S. Han, “Proxylessnas: Direct neural architecture search on target task and hardware,” in Proceedings of the international conference on learning representations (ICLR), 2019.
[36] H. Cai, L. Zhu, and S. Han, “Proxylessnas: Direct neural architecture search on target task and hardware” は、2019年の国際学習表現会議(ICLR)のProceedingsで発表された。
0.88
[37] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Citeseer, Tech.
37] A. KrizhevskyとG. Hintonは、"小さな画像から複数の機能層を学ぶ"、Citeseer、Tech。
0.73
Rep., 2009. [38] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al , “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol.
2009年、退社。 38] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al, “Imagenet large scale visual recognition challenge”, International Journal of Computer Vision, vol。
0.83
115, no. 3, pp.
115年だ 3、p。
0.53
211–252, 2015.
211–252, 2015.
0.84
[39] Y. W. Y.
[39] Y. W. Y.
0.94
Z. B. Z. Y. T. S. L. Mingbao Lin, Rongrong Ji*, “Hrank: Filter pruning using high-rank feature map,” in IEEE International Conference on Computer Vision and Pattern Recognition, 2020.
Z。 B。 Z.Y.T.S.L.Mingbao Lin, Rongrong Ji*, “Hrank: Filter pruning using High-rank feature map” がIEEE International Conference on Computer Vision and Pattern Recognition, 2020で発表された。
0.79
[40] J.-H. Luo, H. Zhang, H.-Y.
[40] J.-H. Luo, H. Zhang, H.-Y.
0.78
Zhou, C.-W. Xie, J. Wu, and W. Lin, “Thinet: pruning cnn filters for a thinner net,” IEEE transactions on pattern analysis and machine intelligence, 2018.
Zhou, C.-W. Xie, J. Wu, W. Lin, “Thinet: pruning cnn filters for a thinner net”, IEEEのパターン分析とマシンインテリジェンスに関するトランザクションは2018年だ。
0.92
[41] T. Zhang, G.-J.
41] T. Zhang、G.-J。
0.85
Qi, B. Xiao, and J. Wang, “Interleaved group convolutions,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp.
Qi, B. Xiao, J. Wang, “Interleaved group convolutions” in Proceedings of the IEEE International Conference on Computer Vision, 2017 pp。
0.78
4373–4382.
4373–4382.
0.71
[42] X. Wang, M. Kan, S. Shan, and X. Chen, “Fully learnable group convolution for acceleration of deep neural networks,” arXiv preprint arXiv:1904.00346, 2019.
X. Wang, M. Kan, S. Shan, X. Chen, “Fully learnable group convolution forAcceler of Deep Neural Network” arXiv preprint arXiv:1904.00346, 2019。
0.84
[43] Q. Guo, X.-J.
[43] Q. Guo, X.-J.
0.88
Wu, J. Kittlerz, and Z. Feng, “Self-grouping convolutional neural networks,” Neural Networks, vol.
Wu, J. Kittlerz, Z. Feng, “Self-grouping convolutional neural network”, Neural Networks, vol。
0.88
132, pp. 491–505, 2020.
132, pp。 491–505, 2020.
0.82
[44] X. Zhang, Z. Li, C. Change Loy, and D. Lin, “Polynet: A pursuit of structural diversity in very deep networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.
The Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017 pp.[44] X. Zhang, Z. Li, C. Change Loy, D. Lin, “Polynet: A pursue of Structure diversity in very Deep Network”. 2017年9月1日閲覧。
0.87
718–726. [45] E. L. Allgower and K. Georg, Numerical continuation methods: an introduction.
718–726. [45] E.L. AllgowerとK. Georg、数値連続法:紹介。
0.74
Springer Science & Business Media, 2012, vol.
Springer Science & Business Media, 2012 vol。
0.71
13. [46] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
13. 946] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, “Automatic differentiation in pytorch” 2017
0.87
[47] S. Lin, R. Ji, C. Yan, B. Zhang, L. Cao, Q. Ye, F. Huang, and D. Doermann, “Towards optimal structured cnn pruning via generative the IEEE Conference on adversarial Computer Vision and Pattern Recognition, 2019, pp.
[47]S. Lin, R. Ji, C. Yan, B. Zhang, L. Cao, Q. Ye, F. Huang, D. Doermann, “最適な構造化cnnプルーニングは、IEEE Conference on adversarial Computer Vision and Pattern Recognition, 2019, pp.”。
0.94
2790–2799.
2790–2799.
0.71
[48] R. Yu, A. Li, C.-F. Chen, J.-H. Lai, V. I. Morariu, X. Han, M. Gao, C.-Y.
[48]R. Yu, A. Li, C.-F. Chen, J.-H. Lai, V. I. Morariu, X. Han, M. Gao, C.-Y。
0.76
Lin, and L. S. Davis, “Nisp: Pruning networks using neuron importance score propagation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp.
Lin, and L.S. Davis, “Nisp: Pruning Network using neuron importance score propagation” は、IEEE Conference on Computer Vision and Pattern Recognition, 2018 pp. において発表された。
0.86
9194–9203.
9194–9203.
0.71
[49] Y. He, J. Lin, Z. Liu, H. Wang, L.-J.
49) であった。 J. Lin, Z. Liu, H. Wang, L.-J
0.78
Li, and S. Han, “Amc: Automl for model compression and acceleration on mobile devices,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp.
ac: automl for model compression and acceleration on mobile devices” in proceedings of the european conference on computer vision (eccv), 2018, pp. “amc: automl for model compression and acceleration on mobile devices” (英語)
0.76
784–800. [50] Y. Li, S. Lin, B. Zhang, J. Liu, D. Doermann, Y. Wu, F. Huang, and R. Ji*, “Exploiting kernel sparsity and entropy for interpretable cnn compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.
784–800. 50] Y. Li, S. Lin, B. Zhang, J. Liu, D. Doermann, Y. Wu, F. Huang, and R. Ji*, “Exploiting kernel sparsity and entropy for interpretable cnn compression” は、IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 2019で発表された。
0.81
8847–8856.
8847–8856.
0.71
[51] S. Lin, R. Ji, Y. Li, Y. Wu, F. Huang, and B. Zhang, “Accelerating convolutional networks via global & dynamic filter pruning.” in IJCAI, 2018, pp.
[51] S. Lin, R. Ji, Y. Li, Y. Wu, F. Huang, B. Zhang, “Global and dynamic filter pruning.” in IJCAI, 2018, pp.
0.82
2425–2432.
2425–2432.
0.71
[52] Y. Z.
52) であった。 Z。
0.69
B. Z. Y. W. Y. T. Mingbao Lin, Rongrong Ji*, “Channel pruning via automatic structure search,” in International Joint Conference on Artificial Intelligence, 2020.
B。 Z.Y.W.Y.T.Mingbao Lin, Rongrong Ji*, “Channel pruning via Automatic structure search”. International Joint Conference on Artificial Intelligence, 2020. (英語)
0.84
[53] M. Tan, B. Chen, R. Pang, V. Vasudevan, and Q. V. Le, “Mnasnet: Platform-aware neural architecture search for mobile,” arXiv preprint arXiv:1807.11626, 2018.
M. Tan, B. Chen, R. Pang, V. Vasudevan, Q. V. Le, “Mnasnet: Platform-aware Neural Architecture search for mobile”, arXiv preprint arXiv:1807.11626, 2018.
0.94
learning,” in Proceedings of
学習する」という手順は
0.69
Qingbei Guo received the M.S.
Qingbei GuoはM.S.を受け取りました。
0.57
degree from the School of Computer Science and Technology, Shandong University, Jinan, China, in 2006.
2006年、中国・江南省山東大学コンピュータ科学・技術学部卒業。
0.52
He is a member of the Shandong Provincial Key Laboratory of Network based Intelligent Computing and the lecturer in the School of Information Science and Engineering, University of Jinan.
degree in Computer Software from Northwestern Polytechnical University, China, in 1995, and Ph.D. degree in Computer Science & Engineering from Shandong University, China, in 2006.
He is currently a Professor at University of Jinan, China.
現在は中国・江南大学の教授を務めている。
0.69
Dr. Feng is a visiting Professor of Sichuang Mianyang Normal University.
Feng博士は四川美陽師範大学の客員教授です。
0.64
As the first author or corresponding author he has published more than 100 papers in international journals and conference proceedings, 2 books, and 30 patents in the areas of human hand recognition and human-computer interaction.
He has served as the Deputy Director of Shandong Provincial Key Laboratory of network based Intelligent Computing, group leader of Human Computer Interaction based on natural hand, editorial board member of Computer Aided Drafting Design and Manufacturing, CADDM, and also an editorial board member of The Open Virtual Reality Journal.
shandong provincial key laboratory of network based intelligent computing, group leader of human computer interaction based by natural hand, editor board member of computer assisted drafting design and manufacturing, caddm, and editor board member of the open virtual reality journalの編集者を務めた。
0.68
He is a deputy editor of World Research Journal of Pattern Recognition and a member of Computer Graphics professional committee.
彼はWorld Research Journal of Pattern Recognitionの副編集長であり、Computer Graphicsの専門家委員会のメンバーです。
0.79
Dr. Feng’s research interests are in human hand tracking/recognition /interaction, virtual reality, human-computer interaction, and image processing.
His research has been extensively supported by the Key R&D Projects of the Ministry of Science and Technology, Natural Science Foundation of China, Key Projects of Natural Science Foundation of Shandong Province, and Key R&D Projects of Shandong Province with total grant funding over three million RMB.
For more information, please refer to http://nbic.ujn.edu. cn/nbic/index.php.
詳細はhttp://nbic.ujn.edu. cn/nbic/index.phpを参照。
0.52
Xiao-jun Wu received his B.S.
Xiao-jun WuはB.S.を取得しました。
0.45
degree in mathematics from Nanjing Normal University, Nanjing, PR China in 1991 and M.S.
南京師範大学, 南京, PR China, 1991年, M.S.の数学の学位取得
0.80
degree in 1996, and Ph.D. degree in Pattern Recognition and Intelligent System in 2002, both from Nanjing University of Science and Technology, Nanjing, PR China, respectively.
He was a fellow of United Nations University, International Institute for Software Technology (UNU/IIST) from 1999 to 2000.
1999年から2000年まで、united nations university, international institute for software technology (unu/iist) のフェローを務めた。
0.76
From 1996 to 2006, he taught in the School of Electronics and Information, Jiangsu University of Science and Technology where he was an exceptionally promoted professor.
degrees from the University of Cambridge, in 1971, 1974, and 1991, respectively.
1971年、1974年、1991年にケンブリッジ大学から学位を取得した。
0.62
He is a distinguished Professor of Machine Intelligence at the Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, U.K.
イギリスのギルフォードにあるサリー大学ビジョン・スピーチ・シグナル処理センター(英語版)の機械知能の著名な教授である。 訳抜け防止モード: 彼は視覚・音声・信号処理センター(Central for Vision, Speech and Signal Processing)の機械情報学の著名な教授である。 サリー大学ギルフォード校(イギリス)
0.67
He conducts research in biometrics, video and image database retrieval, medical image analysis, and cognitive vision.
He published the textbook Pattern Recognition: A Statistical Approach and over 700 scientific papers.
彼は教科書Pattern Recognition: A Statistical Approachと700以上の科学論文を出版した。
0.79
His publications have been cited more than 66,000 times (Google Scholar).
彼の著作は66,000回以上引用されている(Google Scholar)。
0.71
He is series editor of Springer Lecture Notes on Computer Science.
Springer Lecture Notes on Computer Scienceのシリーズ編集者である。
0.75
He currently serves on the Editorial Boards of Pattern Recognition Letters, Pattern Recognition and Artificial Intelligence, Pattern Analysis and Applications.
現在は、Patent Recognition Letters, Pattern Recognition and Artificial Intelligence, Pattern Analysis and Applicationsの編集委員を務めている。
0.75
He also served as a member of the Editorial Board of IEEE Transactions on Pattern Analysis and Machine Intelligence during 1982-1985.
1982-1985年にはIEEE Transactions on Pattern Analysis and Machine Intelligenceの編集委員も務めた。
0.63
He served on the Governing Board of the International Association for Pattern Recognition (IAPR) as one of the two British representatives during the period 1982-2005, President of the IAPR during 1994-1996.