Variation Matters: from Mitigating to Embracing Zero-Shot NAS Ranking Function Variation
- URL: http://arxiv.org/abs/2502.19657v1
- Date: Thu, 27 Feb 2025 01:01:22 GMT
- Title: Variation Matters: from Mitigating to Embracing Zero-Shot NAS Ranking Function Variation
- Authors: Pavel Rumiantsev, Mark Coates,
- Abstract summary: We propose taking into account the variation in the ranking function output as a random variable representing a proxy performance metric.<n>During the search process, we strive to construct a ordering of the performance metrics to determine the best architecture.<n>Our experiments show that the proposed ordering can effectively boost performance of a search on standard benchmark search spaces.
- Score: 18.672184596814077
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Architecture Search (NAS) is a powerful automatic alternative to manual design of a neural network. In the zero-shot version, a fast ranking function is used to compare architectures without training them. The outputs of the ranking functions often vary significantly due to different sources of randomness, including the evaluated architecture's weights' initialization or the batch of data used for calculations. A common approach to addressing the variation is to average a ranking function output over several evaluations. We propose taking into account the variation in a different manner, by viewing the ranking function output as a random variable representing a proxy performance metric. During the search process, we strive to construct a stochastic ordering of the performance metrics to determine the best architecture. Our experiments show that the proposed stochastic ordering can effectively boost performance of a search on standard benchmark search spaces.
Related papers
- ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity.
This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics.
Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z) - Generalized Neural Sorting Networks with Error-Free Differentiable Swap Functions [44.71975181739874]
We consider sorting problems for more abstract yet expressive inputs, e.g., multi-digit images and image fragments, through a neural sorting network.
To learn a mapping from a high-dimensional input to an ordinal variable, the differentiability of sorting networks needs to be guaranteed.
arXiv Detail & Related papers (2023-10-11T03:47:34Z) - Learning Interpretable Heuristics for WalkSAT [0.34265828682659694]
We present an approach for learning effective variable scoring functions and noise parameters by using reinforcement learning.
Our experimental results show improvements with respect to both a WalkSAT baseline and another local search learned.
arXiv Detail & Related papers (2023-07-10T14:52:14Z) - Performance Embeddings: A Similarity-based Approach to Automatic
Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications.
We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - Shapley-NAS: Discovering Operation Contribution for Neural Architecture
Search [96.20505710087392]
We propose a Shapley value based method to evaluate operation contribution (Shapley-NAS) for neural architecture search.
We show that our method outperforms the state-of-the-art methods by a considerable margin with light search cost.
arXiv Detail & Related papers (2022-06-20T14:41:49Z) - Approximate Neural Architecture Search via Operation Distribution
Learning [4.358626952482686]
We show that given an architectural cell, its performance largely depends on the ratio of used operations.
This intuition is to any specific search strategy and can be applied to a diverse set of NAS algorithms.
arXiv Detail & Related papers (2021-11-08T17:38:29Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - PiRank: Learning To Rank via Differentiable Sorting [85.28916333414145]
We propose PiRank, a new class of differentiable surrogates for ranking.
We show that PiRank exactly recovers the desired metrics in the limit of zero temperature.
arXiv Detail & Related papers (2020-12-12T05:07:36Z) - Analysis of Multivariate Scoring Functions for Automatic Unbiased
Learning to Rank [14.827143632277274]
AutoULTR algorithms that jointly learn user bias models (i.e., propensity models) with unbiased rankers have received a lot of attention due to their superior performance and low deployment cost in practice.
Recent advances in context-aware learning-to-rank models have shown that multivariate scoring functions, which read multiple documents together and predict their ranking scores jointly, are more powerful than uni-variate ranking functions in ranking tasks with human-annotated relevance labels.
Our experiments with synthetic clicks on two large-scale benchmark datasets show that AutoULTR models with permutation-invariant multivariate scoring functions significantly outperform
arXiv Detail & Related papers (2020-08-20T16:31:59Z) - Fast Differentiable Sorting and Ranking [36.40586857569459]
We propose the first differentiable sorting and ranking operators with $O(n log n)$ time and $O(n)$ space complexity.
We achieve this feat by constructing differentiable operators as projections onto the permutahedron, the convex hull of permutations, and using a reduction to isotonic optimization.
arXiv Detail & Related papers (2020-02-20T17:11:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.