Inter-layer Transition in Neural Architecture Search
- URL: http://arxiv.org/abs/2011.14525v1
- Date: Mon, 30 Nov 2020 03:33:52 GMT
- Title: Inter-layer Transition in Neural Architecture Search
- Authors: Benteng Ma, Jing Zhang, Yong Xia, Dacheng Tao
- Abstract summary: The dependency between the architecture weights of connected edges is explicitly modeled in this paper.
Experiments on five benchmarks confirm the value of modeling inter-layer dependency and demonstrate the proposed method outperforms state-of-the-art methods.
- Score: 89.00449751022771
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differential Neural Architecture Search (NAS) methods represent the network
architecture as a repetitive proxy directed acyclic graph (DAG) and optimize
the network weights and architecture weights alternatively in a differential
manner. However, existing methods model the architecture weights on each edge
(i.e., a layer in the network) as statistically independent variables, ignoring
the dependency between edges in DAG induced by their directed topological
connections. In this paper, we make the first attempt to investigate such
dependency by proposing a novel Inter-layer Transition NAS method. It casts the
architecture optimization into a sequential decision process where the
dependency between the architecture weights of connected edges is explicitly
modeled. Specifically, edges are divided into inner and outer groups according
to whether or not their predecessor edges are in the same cell. While the
architecture weights of outer edges are optimized independently, those of inner
edges are derived sequentially based on the architecture weights of their
predecessor edges and the learnable transition matrices in an attentive
probability transition manner. Experiments on five benchmarks confirm the value
of modeling inter-layer dependency and demonstrate the proposed method
outperforms state-of-the-art methods.
Related papers
- Detecting and Approximating Redundant Computational Blocks in Neural Networks [25.436785396394804]
intra-network similarities present new opportunities for designing more efficient neural networks.
We introduce a simple metric, Block Redundancy, to detect redundant blocks, and propose Redundant Blocks Approximation (RBA) to approximate redundant blocks.
RBA reduces model parameters and time complexity while maintaining good performance.
arXiv Detail & Related papers (2024-10-07T11:35:24Z) - Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation [54.50526986788175]
Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs.
We present a unified view of these models, formulating such layers as implicit causal self-attention layers.
Our framework compares the underlying mechanisms on similar grounds for different layers and provides a direct means for applying explainability methods.
arXiv Detail & Related papers (2024-05-26T09:57:45Z) - DepGraph: Towards Any Structural Pruning [68.40343338847664]
We study general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers.
We propose a general and fully automatic method, emphDependency Graph (DepGraph), to explicitly model the dependency between layers and comprehensively group parameters for pruning.
In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a
arXiv Detail & Related papers (2023-01-30T14:02:33Z) - Rethinking Architecture Selection in Differentiable NAS [74.61723678821049]
Differentiable Neural Architecture Search is one of the most popular NAS methods for its search efficiency and simplicity.
We propose an alternative perturbation-based architecture selection that directly measures each operation's influence on the supernet.
We find that several failure modes of DARTS can be greatly alleviated with the proposed selection method.
arXiv Detail & Related papers (2021-08-10T00:53:39Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - Adversarially Robust Neural Architectures [43.74185132684662]
This paper aims to improve the adversarial robustness of the network from the architecture perspective with NAS framework.
We explore the relationship among adversarial robustness, Lipschitz constant, and architecture parameters.
Our algorithm empirically achieves the best performance among all the models under various attacks on different datasets.
arXiv Detail & Related papers (2020-09-02T08:52:15Z) - DC-NAS: Divide-and-Conquer Neural Architecture Search [108.57785531758076]
We present a divide-and-conquer (DC) approach to effectively and efficiently search deep neural architectures.
We achieve a $75.1%$ top-1 accuracy on the ImageNet dataset, which is higher than that of state-of-the-art methods using the same search space.
arXiv Detail & Related papers (2020-05-29T09:02:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.