NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance
- URL: http://arxiv.org/abs/2408.08776v1
- Date: Fri, 16 Aug 2024 14:38:14 GMT
- Title: NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance
- Authors: Raphael T. Husistein, Markus Reiher, Marco Eckhoff,
- Abstract summary: We propose a zero-cost proxy Network Expressivity by Activation Rank (NEAR) to identify the optimal neural network without training.
We demonstrate the cutting-edge correlation between this network score and the model accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial neural networks have been shown to be state-of-the-art machine learning models in a wide variety of applications, including natural language processing and image recognition. However, building a performant neural network is a laborious task and requires substantial computing power. Neural Architecture Search (NAS) addresses this issue by an automatic selection of the optimal network from a set of potential candidates. While many NAS methods still require training of (some) neural networks, zero-cost proxies promise to identify the optimal network without training. In this work, we propose the zero-cost proxy Network Expressivity by Activation Rank (NEAR). It is based on the effective rank of the pre- and post-activation matrix, i.e., the values of a neural network layer before and after applying its activation function. We demonstrate the cutting-edge correlation between this network score and the model accuracy on NAS-Bench-101 and NATS-Bench-SSS/TSS. In addition, we present a simple approach to estimate the optimal layer sizes in multi-layer perceptrons. Furthermore, we show that this score can be utilized to select hyperparameters such as the activation function and the neural network weight initialization scheme.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Activity Sparsity Complements Weight Sparsity for Efficient RNN
Inference [2.0822643340897273]
We show that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model.
We achieve up to $20times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank language modeling task.
arXiv Detail & Related papers (2023-11-13T08:18:44Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - A Faster Approach to Spiking Deep Convolutional Neural Networks [0.0]
Spiking neural networks (SNNs) have closer dynamics to the brain than current deep neural networks.
We propose a network structure based on previous work to improve network runtime and accuracy.
arXiv Detail & Related papers (2022-10-31T16:13:15Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Neural networks with linear threshold activations: structure and
algorithms [1.795561427808824]
We show that 2 hidden layers are necessary and sufficient to represent any function representable in the class.
We also give precise bounds on the sizes of the neural networks required to represent any function in the class.
We propose a new class of neural networks that we call shortcut linear threshold networks.
arXiv Detail & Related papers (2021-11-15T22:33:52Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.