The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture
Design
- URL: http://arxiv.org/abs/2105.12210v1
- Date: Tue, 25 May 2021 20:47:43 GMT
- Title: The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture
Design
- Authors: George Philipp
- Abstract summary: We develop methods that can predict, without any training, whether an architecture will achieve a relatively high test or training error on a task after training.
We then go on to explain the error in terms of the architecture definition itself and develop tools for modifying the architecture.
Our first major contribution is to show that the 'degree of nonlinearity' of a neural architecture is a key causal driver behind its performance.
- Score: 3.04585143845864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In essence, a neural network is an arbitrary differentiable, parametrized
function. Choosing a neural network architecture for any task is as complex as
searching the space of those functions. For the last few years, 'neural
architecture design' has been largely synonymous with 'neural architecture
search' (NAS), i.e. brute-force, large-scale search. NAS has yielded
significant gains on practical tasks. However, NAS methods end up searching for
a local optimum in architecture space in a small neighborhood around
architectures that often go back decades, based on CNN or LSTM.
In this work, we present a different and complementary approach to
architecture design, which we term 'zero-shot architecture design' (ZSAD). We
develop methods that can predict, without any training, whether an architecture
will achieve a relatively high test or training error on a task after training.
We then go on to explain the error in terms of the architecture definition
itself and develop tools for modifying the architecture based on this
explanation. This confers an unprecedented level of control on the deep
learning practitioner. They can make informed design decisions before the first
line of code is written, even for tasks for which no prior art exists.
Our first major contribution is to show that the 'degree of nonlinearity' of
a neural architecture is a key causal driver behind its performance, and a
primary aspect of the architecture's model complexity. We introduce the
'nonlinearity coefficient' (NLC), a scalar metric for measuring nonlinearity.
Via extensive empirical study, we show that the value of the NLC in the
architecture's randomly initialized state before training is a powerful
predictor of test error after training and that attaining a right-sized NLC is
essential for attaining an optimal test error. The NLC is also conceptually
simple, well-defined for any feedforward network, easy and cheap to compute,
has extensive theoretical, empirical and conceptual grounding, follows
instructively from the architecture definition, and can be easily controlled
via our 'nonlinearity normalization' algorithm. We argue that the NLC is the
most powerful scalar statistic for architecture design specifically and neural
network analysis in general. Our analysis is fueled by mean field theory, which
we use to uncover the 'meta-distribution' of layers.
Beyond the NLC, we uncover and flesh out a range of metrics and properties
that have a significant explanatory influence on test and training error. We go
on to explain the majority of the error variation across a wide range of
randomly generated architectures with these metrics and properties. We compile
our insights into a practical guide for architecture designers, which we argue
can significantly shorten the trial-and-error phase of deep learning
deployment.
Our results are grounded in an experimental protocol that exceeds that of the
vast majority of other deep learning studies in terms of carefulness and rigor.
We study the impact of e.g. dataset, learning rate, floating-point precision,
loss function, statistical estimation error and batch inter-dependency on
performance and other key properties. We promote research practices that we
believe can significantly accelerate progress in architecture design research.
Related papers
- Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives.
We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis.
We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z) - NAAP-440 Dataset and Baseline for Neural Architecture Accuracy
Prediction [1.2183405753834562]
We introduce the NAAP-440 dataset of 440 neural architectures, which were trained on CIFAR10 using a fixed recipe.
Experiments indicate that by using off-the-shelf regression algorithms and running up to 10% of the training process, not only is it possible to predict an architecture's accuracy rather precisely.
This approach may serve as a powerful tool for accelerating NAS-based studies and thus dramatically increase their efficiency.
arXiv Detail & Related papers (2022-09-14T13:21:39Z) - FlowNAS: Neural Architecture Search for Optical Flow Estimation [65.44079917247369]
We propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.
Experimental results show that the discovered architecture with the weights inherited from the super-network achieves 4.67% F1-all error on KITTI.
arXiv Detail & Related papers (2022-07-04T09:05:25Z) - Demystifying the Neural Tangent Kernel from a Practical Perspective: Can
it be trusted for Neural Architecture Search without training? [37.29036906991086]
In this work, we revisit several at-initialization metrics that can be derived from the Neural Tangent Kernel (NTK)
We deduce that modern neural architectures exhibit highly non-linear characteristics, making the NTK-based metrics incapable of reliably estimating the performance of an architecture without some amount of training.
We introduce Label-Gradient Alignment (LGA), a novel NTK-based metric whose inherent formulation allows it to capture the large amount of non-linear advantage present in modern neural architectures.
arXiv Detail & Related papers (2022-03-28T08:43:04Z) - Learning Interpretable Models Through Multi-Objective Neural
Architecture Search [0.9990687944474739]
We propose a framework to optimize for both task performance and "introspectability," a surrogate metric for aspects of interpretability.
We demonstrate that jointly optimizing for task error and introspectability leads to more disentangled and debuggable architectures that perform within error.
arXiv Detail & Related papers (2021-12-16T05:50:55Z) - Contrastive Neural Architecture Search with Neural Architecture
Comparators [46.45102111497492]
One of the key steps in Neural Architecture Search (NAS) is to estimate the performance of candidate architectures.
Existing methods either directly use the validation performance or learn a predictor to estimate the performance.
We propose a novel Contrastive Neural Architecture Search (CTNAS) method which performs architecture search by taking the comparison results between architectures as the reward.
arXiv Detail & Related papers (2021-03-08T11:24:07Z) - Weak NAS Predictors Are All You Need [91.11570424233709]
Recent predictor-based NAS approaches attempt to solve the problem with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor.
We shift the paradigm from finding a complicated predictor that covers the whole architecture space to a set of weaker predictors that progressively move towards the high-performance sub-space.
Our method costs fewer samples to find the top-performance architectures on NAS-Bench-101 and NAS-Bench-201, and it achieves the state-of-the-art ImageNet performance on the NASNet search space.
arXiv Detail & Related papers (2021-02-21T01:58:43Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.