Related papers: DNArch: Learning Convolutional Neural Architectures by Backpropagation

DNArch: Learning Convolutional Neural Architectures by Backpropagation

URL: http://arxiv.org/abs/2302.05400v2
Date: Sat, 22 Jul 2023 19:45:46 GMT
Title: DNArch: Learning Convolutional Neural Architectures by Backpropagation
Authors: David W. Romero, Neil Zeghidour
Abstract summary: We present DNArch, a method that jointly learns the weights and the architecture of Convolutional Neural Networks (CNNs) by backpropagation. In particular, DNArch allows learning (i) the size of convolutional kernels at each layer, (ii) the number of channels at each layer, (iii) the position and values of downsampling layers, and (iv) the depth of the network.
Score: 19.399535453449488
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present Differentiable Neural Architectures (DNArch), a method that jointly learns the weights and the architecture of Convolutional Neural Networks (CNNs) by backpropagation. In particular, DNArch allows learning (i) the size of convolutional kernels at each layer, (ii) the number of channels at each layer, (iii) the position and values of downsampling layers, and (iv) the depth of the network. To this end, DNArch views neural architectures as continuous multidimensional entities, and uses learnable differentiable masks along each dimension to control their size. Unlike existing methods, DNArch is not limited to a predefined set of possible neural components, but instead it is able to discover entire CNN architectures across all feasible combinations of kernel sizes, widths, depths and downsampling. Empirically, DNArch finds performant CNN architectures for several classification and dense prediction tasks on sequential and image data. When combined with a loss term that controls the network complexity, DNArch constrains its search to architectures that respect a predefined computational budget during training.

Related papers

Simultaneous Weight and Architecture Optimization for Neural Networks [6.2241272327831485]
We introduce a novel neural network training framework that transforms the process by learning architecture and parameters simultaneously with gradient descent. Central to our approach is a multi-scale encoder-decoder, in which the encoder embeds pairs of neural networks with similar functionalities close to each other. Experiments demonstrate that our framework can discover sparse and compact neural networks maintaining a high performance.
arXiv Detail & Related papers (2024-10-10T19:57:36Z)
Local Kernel Renormalization as a mechanism for feature learning in overparametrized Convolutional Neural Networks [0.0]
Empirical evidence shows that fully-connected neural networks in the infinite-width limit eventually outperform their finite-width counterparts. State-of-the-art architectures with convolutional layers achieve optimal performances in the finite-width regime. We show that the generalization performance of a finite-width FC network can be obtained by an infinite-width network, with a suitable choice of the Gaussian priors.
arXiv Detail & Related papers (2023-07-21T17:22:04Z)
What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime. We prove that deep CNNs adapt to the spatial scale of the target function. We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z)
Dive into Layers: Neural Network Capacity Bounding using Algebraic Geometry [55.57953219617467]
We show that the learnability of a neural network is directly related to its size. We use Betti numbers to measure the topological geometric complexity of input data and the neural network. We perform the experiments on a real-world dataset MNIST and the results verify our analysis and conclusion.
arXiv Detail & Related papers (2021-09-03T11:45:51Z)
Differentiable Neural Architecture Learning for Efficient Neural Network Design [31.23038136038325]
We introduce a novel emph architecture parameterisation based on scaled sigmoid function. We then propose a general emphiable Neural Architecture Learning (DNAL) method to optimize the neural architecture without the need to evaluate candidate neural networks.
arXiv Detail & Related papers (2021-03-03T02:03:08Z)
Hierarchical Neural Architecture Search for Deep Stereo Matching [131.94481111956853]
We propose the first end-to-end hierarchical NAS framework for deep stereo matching. Our framework incorporates task-specific human knowledge into the neural architecture search framework. It is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset.
arXiv Detail & Related papers (2020-10-26T11:57:37Z)
Locality Guided Neural Networks for Explainable Artificial Intelligence [12.435539489388708]
We propose a novel algorithm for back propagation, called Locality Guided Neural Network(LGNN) LGNN preserves locality between neighbouring neurons within each layer of a deep network. In our experiments, we train various VGG and Wide ResNet (WRN) networks for image classification on CIFAR100.
arXiv Detail & Related papers (2020-07-12T23:45:51Z)
The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network Architectures [179.66117325866585]
We investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks. We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance. Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration.
arXiv Detail & Related papers (2020-06-29T17:59:26Z)
On the Empirical Neural Tangent Kernel of Standard Finite-Width Convolutional Neural Network Architectures [3.4698840925433765]
It remains an open question how well NTK theory models standard neural network architectures of widths common in practice. We study this question empirically for two well-known convolutional neural network architectures, namely AlexNet and LeNet. For wider versions of these networks, where the number of channels and widths of fully-connected layers are increased, the deviation decreases.
arXiv Detail & Related papers (2020-06-24T11:40:36Z)
DC-NAS: Divide-and-Conquer Neural Architecture Search [108.57785531758076]
We present a divide-and-conquer (DC) approach to effectively and efficiently search deep neural architectures. We achieve a $75.1%$ top-1 accuracy on the ImageNet dataset, which is higher than that of state-of-the-art methods using the same search space.
arXiv Detail & Related papers (2020-05-29T09:02:16Z)
When Residual Learning Meets Dense Aggregation: Rethinking the Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations. Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.