Related papers: GradSign: Model Performance Inference with Theoretical Insights

GradSign: Model Performance Inference with Theoretical Insights

URL: http://arxiv.org/abs/2110.08616v1
Date: Sat, 16 Oct 2021 17:03:10 GMT
Title: GradSign: Model Performance Inference with Theoretical Insights
Authors: Zhihao Zhang, Zhihao Jia
Abstract summary: We propose GradSign, an accurate, simple, and flexible metric for model performance inference with theoretical insights. We show that GradSign generalizes well to real-world networks and consistently outperforms state-of-the-art gradient-based methods for MPI evaluated by Spearman's rho and Kendall's Tau.
Score: 2.4112990554464235
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: A key challenge in neural architecture search (NAS) is quickly inferring the predictive performance of a broad spectrum of networks to discover statistically accurate and computationally efficient ones. We refer to this task as model performance inference (MPI). The current practice for efficient MPI is gradient-based methods that leverage the gradients of a network at initialization to infer its performance. However, existing gradient-based methods rely only on heuristic metrics and lack the necessary theoretical foundations to consolidate their designs. We propose GradSign, an accurate, simple, and flexible metric for model performance inference with theoretical insights. The key idea behind GradSign is a quantity {\Psi} to analyze the optimization landscape of different networks at the granularity of individual training samples. Theoretically, we show that both the network's training and true population losses are proportionally upper-bounded by {\Psi} under reasonable assumptions. In addition, we design GradSign, an accurate and simple approximation of {\Psi} using the gradients of a network evaluated at a random initialization state. Evaluation on seven NAS benchmarks across three training datasets shows that GradSign generalizes well to real-world networks and consistently outperforms state-of-the-art gradient-based methods for MPI evaluated by Spearman's {\rho} and Kendall's Tau. Additionally, we integrate GradSign into four existing NAS algorithms and show that the GradSign-assisted NAS algorithms outperform their vanilla counterparts by improving the accuracies of best-discovered networks by up to 0.3%, 1.1%, and 1.0% on three real-world tasks.

Related papers

YOSO: You-Only-Sample-Once via Compressed Sensing for Graph Neural Network Training [9.02251811867533]
YOSO (You-Only-Sample-Once) is an algorithm designed to achieve efficient training while preserving prediction accuracy. YOSO not only avoids costly computations in traditional compressed sensing (CS) methods, such as orthonormal basis calculations, but also ensures high-probability accuracy retention.
arXiv Detail & Related papers (2024-11-08T16:47:51Z)
Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks. We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
AIO-P: Expanding Neural Performance Predictors Beyond Image Classification [22.743278613519152]
We propose a novel All-in-One Predictor (AIO-P) to pretrain neural predictors on architecture examples. AIO-P can achieve Mean Absolute Error (MAE) and Spearman's Rank Correlation (SRCC) below 1% and above 0.5, respectively.
arXiv Detail & Related papers (2022-11-30T18:30:41Z)
Network Gradient Descent Algorithm for Decentralized Federated Learning [0.2867517731896504]
We study a fully decentralized federated learning algorithm, which is a novel descent gradient algorithm executed on a communication-based network. In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy. We find that both the learning rate and the network structure play significant roles in determining the NGD estimator's statistical efficiency.
arXiv Detail & Related papers (2022-05-06T02:53:31Z)
Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation. In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN. Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z)
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent [95.94432031144716]
We propose a unified non- optimization framework for the analysis of a learning network. We show that existing guarantees can be trained unified through gradient descent.
arXiv Detail & Related papers (2021-06-25T17:45:00Z)
Robust Learning via Persistency of Excitation [4.674053902991301]
We show that network training using gradient descent is equivalent to a dynamical system parameter estimation problem. We provide an efficient technique for estimating the corresponding Lipschitz constant using extreme value theory. Our approach also universally increases the adversarial accuracy by 0.1% to 0.3% points in various state-of-the-art adversarially trained models.
arXiv Detail & Related papers (2021-06-03T18:49:05Z)
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks. With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [76.05981545084738]
We propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost. We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts. We show that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-03-07T02:12:02Z)
The duality structure gradient descent algorithm: analysis and applications to neural networks [0.0]
We propose an algorithm named duality structure gradient descent (DSGD) that is amenable to non-asymptotic performance analysis. We empirically demonstrate the behavior of DSGD in several neural network training scenarios.
arXiv Detail & Related papers (2017-08-01T21:24:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.