GradSign: Model Performance Inference with Theoretical Insights
- URL: http://arxiv.org/abs/2110.08616v1
- Date: Sat, 16 Oct 2021 17:03:10 GMT
- Title: GradSign: Model Performance Inference with Theoretical Insights
- Authors: Zhihao Zhang, Zhihao Jia
- Abstract summary: We propose GradSign, an accurate, simple, and flexible metric for model performance inference with theoretical insights.
We show that GradSign generalizes well to real-world networks and consistently outperforms state-of-the-art gradient-based methods for MPI evaluated by Spearman's rho and Kendall's Tau.
- Score: 2.4112990554464235
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A key challenge in neural architecture search (NAS) is quickly inferring the
predictive performance of a broad spectrum of networks to discover
statistically accurate and computationally efficient ones. We refer to this
task as model performance inference (MPI). The current practice for efficient
MPI is gradient-based methods that leverage the gradients of a network at
initialization to infer its performance. However, existing gradient-based
methods rely only on heuristic metrics and lack the necessary theoretical
foundations to consolidate their designs. We propose GradSign, an accurate,
simple, and flexible metric for model performance inference with theoretical
insights. The key idea behind GradSign is a quantity {\Psi} to analyze the
optimization landscape of different networks at the granularity of individual
training samples. Theoretically, we show that both the network's training and
true population losses are proportionally upper-bounded by {\Psi} under
reasonable assumptions. In addition, we design GradSign, an accurate and simple
approximation of {\Psi} using the gradients of a network evaluated at a random
initialization state. Evaluation on seven NAS benchmarks across three training
datasets shows that GradSign generalizes well to real-world networks and
consistently outperforms state-of-the-art gradient-based methods for MPI
evaluated by Spearman's {\rho} and Kendall's Tau. Additionally, we integrate
GradSign into four existing NAS algorithms and show that the GradSign-assisted
NAS algorithms outperform their vanilla counterparts by improving the
accuracies of best-discovered networks by up to 0.3%, 1.1%, and 1.0% on three
real-world tasks.
Related papers
- Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - AIO-P: Expanding Neural Performance Predictors Beyond Image
Classification [22.743278613519152]
We propose a novel All-in-One Predictor (AIO-P) to pretrain neural predictors on architecture examples.
AIO-P can achieve Mean Absolute Error (MAE) and Spearman's Rank Correlation (SRCC) below 1% and above 0.5, respectively.
arXiv Detail & Related papers (2022-11-30T18:30:41Z) - Network Gradient Descent Algorithm for Decentralized Federated Learning [0.2867517731896504]
We study a fully decentralized federated learning algorithm, which is a novel descent gradient algorithm executed on a communication-based network.
In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy.
We find that both the learning rate and the network structure play significant roles in determining the NGD estimator's statistical efficiency.
arXiv Detail & Related papers (2022-05-06T02:53:31Z) - Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation.
In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN.
Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z) - Proxy Convexity: A Unified Framework for the Analysis of Neural Networks
Trained by Gradient Descent [95.94432031144716]
We propose a unified non- optimization framework for the analysis of a learning network.
We show that existing guarantees can be trained unified through gradient descent.
arXiv Detail & Related papers (2021-06-25T17:45:00Z) - Robust Learning via Persistency of Excitation [4.674053902991301]
We show that network training using gradient descent is equivalent to a dynamical system parameter estimation problem.
We provide an efficient technique for estimating the corresponding Lipschitz constant using extreme value theory.
Our approach also universally increases the adversarial accuracy by 0.1% to 0.3% points in various state-of-the-art adversarially trained models.
arXiv Detail & Related papers (2021-06-03T18:49:05Z) - Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network
Optimization [16.85167651136133]
We take a broader view of training sparse networks and consider the role of regularization, optimization and architecture choices on sparse models.
We show that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime.
arXiv Detail & Related papers (2021-02-02T18:40:26Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - ReActNet: Towards Precise Binary Neural Network with Generalized
Activation Functions [76.05981545084738]
We propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost.
We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts.
We show that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-03-07T02:12:02Z) - The duality structure gradient descent algorithm: analysis and applications to neural networks [0.0]
We propose an algorithm named duality structure gradient descent (DSGD) that is amenable to non-asymptotic performance analysis.
We empirically demonstrate the behavior of DSGD in several neural network training scenarios.
arXiv Detail & Related papers (2017-08-01T21:24:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.