Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks
- URL: http://arxiv.org/abs/2102.08574v1
- Date: Wed, 17 Feb 2021 04:47:18 GMT
- Title: Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks
- Authors: Lemeng Wu, Bo Liu, Peter Stone, Qiang Liu
- Abstract summary: Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks.
We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures.
In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
- Score: 50.684661759340145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose firefly neural architecture descent, a general framework for
progressively and dynamically growing neural networks to jointly optimize the
networks' parameters and architectures. Our method works in a steepest descent
fashion, which iteratively finds the best network within a functional
neighborhood of the original network that includes a diverse set of candidate
network structures. By using Taylor approximation, the optimal network
structure in the neighborhood can be found with a greedy selection procedure.
We show that firefly descent can flexibly grow networks both wider and deeper,
and can be applied to learn accurate but resource-efficient neural
architectures that avoid catastrophic forgetting in continual learning.
Empirically, firefly descent achieves promising results on both neural
architecture search and continual learning. In particular, on a challenging
continual image classification task, it learns networks that are smaller in
size but have higher average accuracy than those learned by the
state-of-the-art methods.
Related papers
- Simultaneous Weight and Architecture Optimization for Neural Networks [6.2241272327831485]
We introduce a novel neural network training framework that transforms the process by learning architecture and parameters simultaneously with gradient descent.
Central to our approach is a multi-scale encoder-decoder, in which the encoder embeds pairs of neural networks with similar functionalities close to each other.
Experiments demonstrate that our framework can discover sparse and compact neural networks maintaining a high performance.
arXiv Detail & Related papers (2024-10-10T19:57:36Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Deep Dependency Networks for Multi-Label Classification [24.24496964886951]
We show that the performance of previous approaches that combine Markov Random Fields with neural networks can be modestly improved.
We propose a new modeling framework called deep dependency networks, which augments a dependency network.
Despite its simplicity, jointly learning this new architecture yields significant improvements in performance.
arXiv Detail & Related papers (2023-02-01T17:52:40Z) - SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and
Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task.
The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders.
To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - NAS-Navigator: Visual Steering for Explainable One-Shot Deep Neural
Network Synthesis [53.106414896248246]
We present a framework that allows analysts to effectively build the solution sub-graph space and guide the network search by injecting their domain knowledge.
Applying this technique in an iterative manner allows analysts to converge to the best performing neural network architecture for a given application.
arXiv Detail & Related papers (2020-09-28T01:48:45Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Modeling Dynamic Heterogeneous Network for Link Prediction using
Hierarchical Attention with Temporal RNN [16.362525151483084]
We propose a novel dynamic heterogeneous network embedding method, termed as DyHATR.
It uses hierarchical attention to learn heterogeneous information and incorporates recurrent neural networks with temporal attention to capture evolutionary patterns.
We benchmark our method on four real-world datasets for the task of link prediction.
arXiv Detail & Related papers (2020-04-01T17:16:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.