Generalization Guarantees for Neural Architecture Search with
Train-Validation Split
- URL: http://arxiv.org/abs/2104.14132v1
- Date: Thu, 29 Apr 2021 06:11:00 GMT
- Title: Generalization Guarantees for Neural Architecture Search with
Train-Validation Split
- Authors: Samet Oymak, Mingchen Li, Mahdi Soltanolkotabi
- Abstract summary: This paper explores the statistical aspects of such problems with train-validation splits.
We show that refined properties of the validation loss such as risk and hyper-gradients are indicative of those of the true test loss.
We also highlight rigorous connections between NAS, multiple kernel learning, and low-rank matrix learning.
- Score: 48.265305046655996
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Neural Architecture Search (NAS) is a popular method for automatically
designing optimized architectures for high-performance deep learning. In this
approach, it is common to use bilevel optimization where one optimizes the
model weights over the training data (lower-level problem) and various
hyperparameters such as the configuration of the architecture over the
validation data (upper-level problem). This paper explores the statistical
aspects of such problems with train-validation splits. In practice, the
lower-level problem is often overparameterized and can easily achieve zero
loss. Thus, a-priori it seems impossible to distinguish the right
hyperparameters based on training loss alone which motivates a better
understanding of the role of train-validation split. To this aim this work
establishes the following results. (1) We show that refined properties of the
validation loss such as risk and hyper-gradients are indicative of those of the
true test loss. This reveals that the upper-level problem helps select the most
generalizable model and prevent overfitting with a near-minimal validation
sample size. Importantly, this is established for continuous spaces -- which
are highly relevant for popular differentiable search schemes. (2) We establish
generalization bounds for NAS problems with an emphasis on an activation search
problem. When optimized with gradient-descent, we show that the
train-validation procedure returns the best (model, architecture) pair even if
all architectures can perfectly fit the training data to achieve zero error.
(3) Finally, we highlight rigorous connections between NAS, multiple kernel
learning, and low-rank matrix learning. The latter leads to novel algorithmic
insights where the solution of the upper problem can be accurately learned via
efficient spectral methods to achieve near-minimal risk.
Related papers
- Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches [4.577842191730992]
We study ways toward robust OoD generalization for deep learning.
We first propose a novel and effective approach to disentangle the spurious correlation between features that are not essential for recognition.
We then study the problem of strengthening neural architecture search in OoD scenarios.
arXiv Detail & Related papers (2024-10-25T20:50:32Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent
Kernels [141.29156234353133]
State-of-the-art convex learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions.
We show this disparity can largely be attributed to challenges presented by non-NISTity.
We propose a Train-Convexify neural network (TCT) procedure to sidestep this issue.
arXiv Detail & Related papers (2022-07-13T16:58:22Z) - A Differentiable Approach to Combinatorial Optimization using Dataless
Neural Networks [20.170140039052455]
We propose a radically different approach in that no data is required for training the neural networks that produce the solution.
In particular, we reduce the optimization problem to a neural network and employ a dataless training scheme to refine the parameters of the network such that those parameters yield the structure of interest.
arXiv Detail & Related papers (2022-03-15T19:21:31Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture
Design [3.04585143845864]
We develop methods that can predict, without any training, whether an architecture will achieve a relatively high test or training error on a task after training.
We then go on to explain the error in terms of the architecture definition itself and develop tools for modifying the architecture.
Our first major contribution is to show that the 'degree of nonlinearity' of a neural architecture is a key causal driver behind its performance.
arXiv Detail & Related papers (2021-05-25T20:47:43Z) - ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse
Coding [86.40042104698792]
We formulate neural architecture search as a sparse coding problem.
In experiments, our two-stage method on CIFAR-10 requires only 0.05 GPU-day for search.
Our one-stage method produces state-of-the-art performances on both CIFAR-10 and ImageNet at the cost of only evaluation time.
arXiv Detail & Related papers (2020-10-13T04:34:24Z) - Inexact Derivative-Free Optimization for Bilevel Learning [0.27074235008521236]
Variational regularization techniques are dominant in the field of mathematical imaging.
A by now common strategy to resolve this issue is to learn these parameters from data.
It is common when solving the upper-level problem to assume access to exact solutions of the lower-level problem, which is practically infeasible.
We propose to solve these problems using inexact derivative-free optimization algorithms which never require exact lower-level problem solutions.
arXiv Detail & Related papers (2020-06-23T00:17:32Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - MiLeNAS: Efficient Neural Architecture Search via Mixed-Level
Reformulation [25.56562895285528]
mldas is a mixed-level reformulation for NAS that can be optimized efficiently and reliably.
It is shown that even when using a simple first-order method on the mixed-level formulation, mldas can achieve a lower validation error for NAS problems.
arXiv Detail & Related papers (2020-03-27T05:06:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.