Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search
- URL: http://arxiv.org/abs/2503.10404v1
- Date: Thu, 13 Mar 2025 14:30:17 GMT
- Title: Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search
- Authors: Matteo Gambella, Fabrizio Pittorino, Manuel Roveri,
- Abstract summary: We investigate the geometric properties of neural architecture spaces commonly used in differentiable NAS methods.<n>Building on these insights, we propose Architecture-Aware Minimization (A$2$M), a novel analytically derived algorithmic framework.<n>A$2$M consistently improves generalization over state-of-the-art DARTS-based algorithms on benchmark datasets.
- Score: 3.724847012963521
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Architecture Search (NAS) has become an essential tool for designing effective and efficient neural networks. In this paper, we investigate the geometric properties of neural architecture spaces commonly used in differentiable NAS methods, specifically NAS-Bench-201 and DARTS. By defining flatness metrics such as neighborhoods and loss barriers along paths in architecture space, we reveal locality and flatness characteristics analogous to the well-known properties of neural network loss landscapes in weight space. In particular, we find that highly accurate architectures cluster together in flat regions, while suboptimal architectures remain isolated, unveiling the detailed geometrical structure of the architecture search landscape. Building on these insights, we propose Architecture-Aware Minimization (A$^2$M), a novel analytically derived algorithmic framework that explicitly biases, for the first time, the gradient of differentiable NAS methods towards flat minima in architecture space. A$^2$M consistently improves generalization over state-of-the-art DARTS-based algorithms on benchmark datasets including CIFAR-10, CIFAR-100, and ImageNet16-120, across both NAS-Bench-201 and DARTS search spaces. Notably, A$^2$M is able to increase the test accuracy, on average across different differentiable NAS methods, by +3.60\% on CIFAR-10, +4.60\% on CIFAR-100, and +3.64\% on ImageNet16-120, demonstrating its superior effectiveness in practice. A$^2$M can be easily integrated into existing differentiable NAS frameworks, offering a versatile tool for future research and applications in automated machine learning. We open-source our code at https://github.com/AI-Tech-Research-Lab/AsquaredM.
Related papers
- Arch-LLM: Taming LLMs for Neural Architecture Generation via Unsupervised Discrete Representation Learning [2.981775461282335]
A common approach involves the use of Variational Autoencoders (VAEs) to map discrete architectures onto a continuous representation space.
We introduce a Vector Quantized Variational Autoencoder (VQ-VAE) to learn a discrete latent space more naturally aligned with the discrete neural architectures.
Compared to VAE-based methods, our approach improves the generation of valid and unique architectures by over 80% on NASBench-101 and over 8% on NASBench-201.
arXiv Detail & Related papers (2025-03-28T00:56:56Z) - DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques.
Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms.
Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z) - Search-time Efficient Device Constraints-Aware Neural Architecture
Search [6.527454079441765]
Deep learning techniques like computer vision and natural language processing can be computationally expensive and memory-intensive.
We automate the construction of task-specific deep learning architectures optimized for device constraints through Neural Architecture Search (NAS)
We present DCA-NAS, a principled method of fast neural network architecture search that incorporates edge-device constraints.
arXiv Detail & Related papers (2023-07-10T09:52:28Z) - GeNAS: Neural Architecture Search with Better Generalization [14.92869716323226]
Recent neural architecture search (NAS) approaches rely on validation loss or accuracy to find the superior network for the target data.
In this paper, we investigate a new neural architecture search measure for excavating architectures with better generalization.
arXiv Detail & Related papers (2023-05-15T12:44:54Z) - BaLeNAS: Differentiable Architecture Search via the Bayesian Learning
Rule [95.56873042777316]
Differentiable Architecture Search (DARTS) has received massive attention in recent years, mainly because it significantly reduces the computational cost.
This paper formulates the neural architecture search as a distribution learning problem through relaxing the architecture weights into Gaussian distributions.
We demonstrate how the differentiable NAS benefits from Bayesian principles, enhancing exploration and improving stability.
arXiv Detail & Related papers (2021-11-25T18:13:42Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - Neighborhood-Aware Neural Architecture Search [43.87465987957761]
We propose a novel neural architecture search (NAS) method to identify flat-minima architectures in the search space.
Our formulation takes the "flatness" of an architecture into account by aggregating the performance over the neighborhood of this architecture.
Based on our formulation, we propose neighborhood-aware random search (NA-RS) and neighborhood-aware differentiable architecture search (NA-DARTS)
arXiv Detail & Related papers (2021-05-13T15:56:52Z) - Weak NAS Predictors Are All You Need [91.11570424233709]
Recent predictor-based NAS approaches attempt to solve the problem with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor.
We shift the paradigm from finding a complicated predictor that covers the whole architecture space to a set of weaker predictors that progressively move towards the high-performance sub-space.
Our method costs fewer samples to find the top-performance architectures on NAS-Bench-101 and NAS-Bench-201, and it achieves the state-of-the-art ImageNet performance on the NASNet search space.
arXiv Detail & Related papers (2021-02-21T01:58:43Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z) - DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning.
In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs.
With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.