Growth strategies for arbitrary DAG neural architectures
- URL: http://arxiv.org/abs/2501.12690v2
- Date: Fri, 14 Feb 2025 08:28:29 GMT
- Title: Growth strategies for arbitrary DAG neural architectures
- Authors: Stella Douka, Manon Verbockhaven, Théo Rudkiewicz, Stéphane Rivaud, François P. Landes, Sylvain Chevallier, Guillaume Charpiat,
- Abstract summary: We focus on Neural Architecture Growth, which can increase the size of a small model when needed.
We explore strategies to reduce excessive computations and steer network growth toward more parameter-efficient architectures.
- Score: 1.944442137907768
- License:
- Abstract: Deep learning has shown impressive results obtained at the cost of training huge neural networks. However, the larger the architecture, the higher the computational, financial, and environmental costs during training and inference. We aim at reducing both training and inference durations. We focus on Neural Architecture Growth, which can increase the size of a small model when needed, directly during training using information from the backpropagation. We expand existing work and freely grow neural networks in the form of any Directed Acyclic Graph by reducing expressivity bottlenecks in the architecture. We explore strategies to reduce excessive computations and steer network growth toward more parameter-efficient architectures.
Related papers
- Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships.
Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z) - Self Expanding Convolutional Neural Networks [1.4330085996657045]
We present a novel method for dynamically expanding Convolutional Neural Networks (CNNs) during training.
We employ a strategy where a single model is dynamically expanded, facilitating the extraction of checkpoints at various complexity levels.
arXiv Detail & Related papers (2024-01-11T06:22:40Z) - Neural Architecture Codesign for Fast Bragg Peak Analysis [1.7081438846690533]
We develop an automated pipeline to streamline neural architecture codesign for fast, real-time Bragg peak analysis in microscopy.
Our method employs neural architecture search and AutoML to enhance these models, including hardware costs, leading to the discovery of more hardware-efficient neural architectures.
arXiv Detail & Related papers (2023-12-10T19:42:18Z) - Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems.
We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z) - Dimensionality Reduction in Deep Learning via Kronecker Multi-layer
Architectures [4.836352379142503]
We propose a new deep learning architecture based on fast matrix multiplication of a Kronecker product decomposition.
We show that this architecture allows a neural network to be trained and implemented with a significant reduction in computational time and resources.
arXiv Detail & Related papers (2022-04-08T19:54:52Z) - GradMax: Growing Neural Networks using Gradient Information [22.986063120002353]
We present a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics.
We call this technique Gradient Maximizing Growth (GradMax) and demonstrate its effectiveness in variety of vision tasks and architectures.
arXiv Detail & Related papers (2022-01-13T18:30:18Z) - Efficient Neural Architecture Search with Performance Prediction [0.0]
We use a neural architecture search to find the best network architecture for the task at hand.
Existing NAS algorithms generally evaluate the fitness of a new architecture by fully training from scratch.
An end-to-end offline performance predictor is proposed to accelerate the evaluation of sampled architectures.
arXiv Detail & Related papers (2021-08-04T05:44:16Z) - Dynamically Grown Generative Adversarial Networks [111.43128389995341]
We propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation.
The method embeds architecture search techniques as an interleaving step with gradient-based training to periodically seek the optimal architecture-growing strategy for the generator and discriminator.
arXiv Detail & Related papers (2021-06-16T01:25:51Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.