Related papers: MIDAS: Mosaic Input-Specific Differentiable Architecture Search

MIDAS: Mosaic Input-Specific Differentiable Architecture Search

URL: http://arxiv.org/abs/2602.17700v1
Date: Fri, 06 Feb 2026 23:16:41 GMT
Title: MIDAS: Mosaic Input-Specific Differentiable Architecture Search
Authors: Konstanty Subbotko,
Abstract summary: MIDAS is a novel approach that modernizes DARTS by replacing static architecture parameters with dynamic, input-specific parameters computed via self-attention.<n>We evaluate MIDAS on the DARTS, NAS-Bench-201, and RDARTS search spaces.
Score: 1.6921396880325779
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Differentiable Neural Architecture Search (NAS) provides efficient, gradient-based methods for automatically designing neural networks, yet its adoption remains limited in practice. We present MIDAS, a novel approach that modernizes DARTS by replacing static architecture parameters with dynamic, input-specific parameters computed via self-attention. To improve robustness, MIDAS (i) localizes the architecture selection by computing it separately for each spatial patch of the activation map, and (ii) introduces a parameter-free, topology-aware search space that models node connectivity and simplifies selecting the two incoming edges per node. We evaluate MIDAS on the DARTS, NAS-Bench-201, and RDARTS search spaces. In DARTS, it reaches 97.42% top-1 on CIFAR-10 and 83.38% on CIFAR-100. In NAS-Bench-201, it consistently finds globally optimal architectures. In RDARTS, it sets the state of the art on two of four search spaces on CIFAR-10. We further analyze why MIDAS works, showing that patchwise attention improves discrimination among candidate operations, and the resulting input-specific parameter distributions are class-aware and predominantly unimodal, providing reliable guidance for decoding.

Related papers

ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity.<n>This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics.<n>Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z)
Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search [3.724847012963521]
We investigate the geometric properties of neural architecture spaces commonly used in differentiable NAS methods.<n>Building on these insights, we propose Architecture-Aware Minimization (A$2$M), a novel analytically derived algorithmic framework.<n>A$2$M consistently improves generalization over state-of-the-art DARTS-based algorithms on benchmark datasets.
arXiv Detail & Related papers (2025-03-13T14:30:17Z)
Flexible Channel Dimensions for Differentiable Architecture Search [50.33956216274694]
We propose a novel differentiable neural architecture search method with an efficient dynamic channel allocation algorithm. We show that the proposed framework is able to find DNN architectures that are equivalent to previous methods in task accuracy and inference latency.
arXiv Detail & Related papers (2023-06-13T15:21:38Z)
BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule [95.56873042777316]
Differentiable Architecture Search (DARTS) has received massive attention in recent years, mainly because it significantly reduces the computational cost. This paper formulates the neural architecture search as a distribution learning problem through relaxing the architecture weights into Gaussian distributions. We demonstrate how the differentiable NAS benefits from Bayesian principles, enhancing exploration and improving stability.
arXiv Detail & Related papers (2021-11-25T18:13:42Z)
ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z)
iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS) We tackle the hypergradient computation in DARTS based on the implicit function theorem. We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z)
DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution. With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization. To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z)
Fine-Grained Stochastic Architecture Search [6.277767522867666]
Fine-Grained Architecture Search (FiGS) is a differentiable search method that searches over a much larger set of candidate architectures. FiGS simultaneously selects and modifies operators in the search space by applying a structured sparse regularization penalty. We show results across 3 existing search spaces, matching or outperforming the original search algorithms.
arXiv Detail & Related papers (2020-06-17T01:04:14Z)
ADWPNAS: Architecture-Driven Weight Prediction for Neural Architecture Search [6.458169480971417]
We propose an Architecture-Driven Weight Prediction (ADWP) approach for neural architecture search (NAS) In our approach, we first design an architecture-intensive search space and then train a HyperNetwork by inputting encoding architecture parameters. Results show that one search procedure can be completed in 4.0 GPU hours on CIFAR-10.
arXiv Detail & Related papers (2020-03-03T05:06:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.