Related papers: Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally

Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally

URL: http://arxiv.org/abs/2405.19816v1
Date: Thu, 30 May 2024 08:23:56 GMT
Title: Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally
Authors: Manon Verbockhaven, Sylvain Chevallier, Guillaume Charpiat,
Abstract summary: In machine learning tasks, one searches for an optimal function within a certain functional space. This way forces the evolution of the function during training to lie within the realm of what is expressible with the chosen architecture. We show that the information about desirable architectural changes, due to expressivity bottlenecks can be extracted from %the backpropagation.
Score: 2.645067871482715
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning tasks are generally formulated as optimization problems, where one searches for an optimal function within a certain functional space. In practice, parameterized functional spaces are considered, in order to be able to perform gradient descent. Typically, a neural network architecture is chosen and fixed, and its parameters (connection weights) are optimized, yielding an architecture-dependent result. This way of proceeding however forces the evolution of the function during training to lie within the realm of what is expressible with the chosen architecture, and prevents any optimization across architectures. Costly architectural hyper-parameter optimization is often performed to compensate for this. Instead, we propose to adapt the architecture on the fly during training. We show that the information about desirable architectural changes, due to expressivity bottlenecks when attempting to follow the functional gradient, can be extracted from %the backpropagation. To do this, we propose a mathematical definition of expressivity bottlenecks, which enables us to detect, quantify and solve them while training, by adding suitable neurons when and where needed. Thus, while the standard approach requires large networks, in terms of number of neurons per layer, for expressivity and optimization reasons, we are able to start with very small neural networks and let them grow appropriately. As a proof of concept, we show results~on the CIFAR dataset, matching large neural network accuracy, with competitive training time, while removing the need for standard architectural hyper-parameter search.

Related papers

Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process. In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture. We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z)
Neuroevolution of Recurrent Architectures on Control Tasks [3.04585143845864]
We implement a massively parallel evolutionary algorithm and run experiments on all 19 OpenAI Gym state-based reinforcement learning control tasks. We find that dynamic agents match or exceed the performance of gradient-based agents while utilizing orders of magnitude fewer parameters.
arXiv Detail & Related papers (2023-04-03T16:29:18Z)
Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network. We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint. Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z)
FlowNAS: Neural Architecture Search for Optical Flow Estimation [65.44079917247369]
We propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task. Experimental results show that the discovered architecture with the weights inherited from the super-network achieves 4.67% F1-all error on KITTI.
arXiv Detail & Related papers (2022-07-04T09:05:25Z)
GradMax: Growing Neural Networks using Gradient Information [22.986063120002353]
We present a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics. We call this technique Gradient Maximizing Growth (GradMax) and demonstrate its effectiveness in variety of vision tasks and architectures.
arXiv Detail & Related papers (2022-01-13T18:30:18Z)
iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS) We tackle the hypergradient computation in DARTS based on the implicit function theorem. We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z)
Convolution Neural Network Hyperparameter Optimization Using Simplified Swarm Optimization [2.322689362836168]
Convolutional Neural Network (CNN) is widely used in computer vision. It is not easy to find a network architecture with better performance.
arXiv Detail & Related papers (2021-03-06T00:23:27Z)
Differentiable Neural Architecture Learning for Efficient Neural Network Design [31.23038136038325]
We introduce a novel emph architecture parameterisation based on scaled sigmoid function. We then propose a general emphiable Neural Architecture Learning (DNAL) method to optimize the neural architecture without the need to evaluate candidate neural networks.
arXiv Detail & Related papers (2021-03-03T02:03:08Z)
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks. It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value. It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
Disentangling Neural Architectures and Weights: A Case Study in Supervised Classification [8.976788958300766]
This work investigates the problem of disentangling the role of the neural structure and its edge weights. We show that well-trained architectures may not need any link-specific fine-tuning of the weights. We use a novel and computationally efficient method that translates the hard architecture-search problem into a feasible optimization problem.
arXiv Detail & Related papers (2020-09-11T11:22:22Z)
A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures. A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
Bayesian Neural Architecture Search using A Training-Free Performance Metric [7.775212462771685]
Recurrent neural networks (RNNs) are a powerful approach for time series prediction. This paper proposes to tackle the architecture optimization problem with a variant of the Bayesian Optimization (BO) algorithm. Also, we propose three fixed-length encoding schemes to cope with the variable-length architecture representation.
arXiv Detail & Related papers (2020-01-29T08:42:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.