Fugu-MT 論文翻訳(概要): The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture Design

論文の概要: The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture Design

arxiv url: http://arxiv.org/abs/2105.12210v1
Date: Tue, 25 May 2021 20:47:43 GMT
ステータス: 翻訳完了
システム内更新日: 2021-05-28 06:39:00.535947
Title: The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture Design
Title（参考訳）: 非線形係数-ニューラルアーキテクチャ設計のための実践的ガイド
Authors: George Philipp
Abstract要約: 我々は、アーキテクチャが比較的高いテストやトレーニング後のタスクのトレーニングエラーを達成できるかどうかを、トレーニングなしで予測できる手法を開発する。その後、アーキテクチャ定義自体の観点でエラーを説明し、アーキテクチャを変更するツールを開発します。最初の大きな貢献は、ニューラルネットワークアーキテクチャの'非線形性の度合い'がそのパフォーマンスの背後にある重要な因果的要因であることを示すことです。
参考スコア（独自算出の注目度）: 3.04585143845864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In essence, a neural network is an arbitrary differentiable, parametrized function. Choosing a neural network architecture for any task is as complex as searching the space of those functions. For the last few years, 'neural architecture design' has been largely synonymous with 'neural architecture search' (NAS), i.e. brute-force, large-scale search. NAS has yielded significant gains on practical tasks. However, NAS methods end up searching for a local optimum in architecture space in a small neighborhood around architectures that often go back decades, based on CNN or LSTM. In this work, we present a different and complementary approach to architecture design, which we term 'zero-shot architecture design' (ZSAD). We develop methods that can predict, without any training, whether an architecture will achieve a relatively high test or training error on a task after training. We then go on to explain the error in terms of the architecture definition itself and develop tools for modifying the architecture based on this explanation. This confers an unprecedented level of control on the deep learning practitioner. They can make informed design decisions before the first line of code is written, even for tasks for which no prior art exists. Our first major contribution is to show that the 'degree of nonlinearity' of a neural architecture is a key causal driver behind its performance, and a primary aspect of the architecture's model complexity. We introduce the 'nonlinearity coefficient' (NLC), a scalar metric for measuring nonlinearity. Via extensive empirical study, we show that the value of the NLC in the architecture's randomly initialized state before training is a powerful predictor of test error after training and that attaining a right-sized NLC is essential for attaining an optimal test error. The NLC is also conceptually simple, well-defined for any feedforward network, easy and cheap to compute, has extensive theoretical, empirical and conceptual grounding, follows instructively from the architecture definition, and can be easily controlled via our 'nonlinearity normalization' algorithm. We argue that the NLC is the most powerful scalar statistic for architecture design specifically and neural network analysis in general. Our analysis is fueled by mean field theory, which we use to uncover the 'meta-distribution' of layers. Beyond the NLC, we uncover and flesh out a range of metrics and properties that have a significant explanatory influence on test and training error. We go on to explain the majority of the error variation across a wide range of randomly generated architectures with these metrics and properties. We compile our insights into a practical guide for architecture designers, which we argue can significantly shorten the trial-and-error phase of deep learning deployment. Our results are grounded in an experimental protocol that exceeds that of the vast majority of other deep learning studies in terms of carefulness and rigor. We study the impact of e.g. dataset, learning rate, floating-point precision, loss function, statistical estimation error and batch inter-dependency on performance and other key properties. We promote research practices that we believe can significantly accelerate progress in architecture design research.
Abstract（参考訳）: 本質的に、ニューラルネットワークは任意の微分可能パラメトリゼーション関数である。どんなタスクでもニューラルネットワークアーキテクチャを選択するのは、それらの関数の空間を検索するのと同じくらい複雑です。ここ数年、'neural architecture design' は、主に 'neural architecture search' (nas) と同義語である。ブルートフォース、大規模な検索。 NASは実践的な仕事において大きな利益をもたらした。しかし、NASの手法は、CNNやLSTMに基づいて数十年後にさかのぼるアーキテクチャ周辺の小さな地区で、アーキテクチャ空間の局所的な最適化を探すことになる。本研究では, ゼロショットアーキテクチャ設計 (ZSAD) と呼ぶアーキテクチャ設計に対して, 異なる補完的なアプローチを示す。我々は、アーキテクチャが比較的高いテストやトレーニング後のタスクのトレーニングエラーを達成できるかどうかを、トレーニングなしで予測できる手法を開発する。次に、アーキテクチャ定義自体の観点からエラーを説明し、この説明に基づいてアーキテクチャを変更するためのツールを開発します。これは、ディープラーニングの実践者に前例のないレベルのコントロールを与える。事前の技術が存在しないタスクであっても、最初のコード行が書かれる前にインフォームドな設計判断を行うことができる。私たちの最初の大きな貢献は、ニューラルアーキテクチャの'非線形性の度合い'がそのパフォーマンスの背後にある重要な因果的要因であり、アーキテクチャのモデルの複雑さの主要な側面であることを示すことです。非線形性係数 (NLC) は, 非線形性を測定するスカラー計量である。広範な実証研究を通じて,学習前のランダム初期化状態におけるnlcの価値は,トレーニング後のテストエラーの強力な予測因子であり,最適なテストエラーを得るためには,右サイズのnlcを達成することが不可欠であることを示した。 NLCは概念的に単純で、任意のフィードフォワードネットワークに対してよく定義されており、計算が容易で安価であり、広範な理論的、経験的、概念的な基盤を持ち、アーキテクチャ定義から命令的に従い、我々の「非線形正規化」アルゴリズムで容易に制御できる。我々は、nlcはアーキテクチャ設計、特にニューラルネットワーク解析の最も強力なスカラー統計であると主張する。我々の分析は平均場理論によって加速され、我々はレイヤーの「メタ分布」を明らかにするために使われる。 NLC以外にも、テストとトレーニングのエラーに大きな説明的影響を持つ、さまざまなメトリクスとプロパティを発見し、具体化しています。続いて、これらのメトリクスと特性を用いて、さまざまなランダムに生成されたアーキテクチャにおけるエラーのばらつきの大半を説明します。アーキテクチャ設計者のための実践的なガイドに洞察をまとめることで、ディープラーニングデプロイメントの試行錯誤フェーズを大幅に短縮できると考えています。本研究は,他の深層学習研究の多くを,注意と厳密性の観点から超越した実験的プロトコルを基礎としている。我々は、例えば、影響を研究する。データセット、学習率、浮動小数点精度、損失関数、統計的推定誤差、パフォーマンスやその他の重要な特性に対するバッチ相互依存性。我々は,建築設計研究の進展を著しく加速させると考えられる研究の実践を促進する。

論文の概要: The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture Design

関連論文リスト