The Untapped Potential of Off-the-Shelf Convolutional Neural Networks
- URL: http://arxiv.org/abs/2103.09891v1
- Date: Wed, 17 Mar 2021 20:04:46 GMT
- Title: The Untapped Potential of Off-the-Shelf Convolutional Neural Networks
- Authors: Matthew Inkawhich, Nathan Inkawhich, Eric Davis, Hai Li and Yiran Chen
- Abstract summary: We show that existing off-the-shelf models like ResNet-50 are capable of over 95% accuracy on ImageNet.
This level of performance currently exceeds that of models with over 20x more parameters and significantly more complex training procedures.
- Score: 29.205446247063673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over recent years, a myriad of novel convolutional network architectures have
been developed to advance state-of-the-art performance on challenging
recognition tasks. As computational resources improve, a great deal of effort
has been placed in efficiently scaling up existing designs and generating new
architectures with Neural Architecture Search (NAS) algorithms. While network
topology has proven to be a critical factor for model performance, we show that
significant gains are being left on the table by keeping topology static at
inference-time. Due to challenges such as scale variation, we should not expect
static models configured to perform well across a training dataset to be
optimally configured to handle all test data. In this work, we seek to expose
the exciting potential of inference-time-dynamic models. By allowing just four
layers to dynamically change configuration at inference-time, we show that
existing off-the-shelf models like ResNet-50 are capable of over 95% accuracy
on ImageNet. This level of performance currently exceeds that of models with
over 20x more parameters and significantly more complex training procedures.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Exploring the design space of deep-learning-based weather forecasting systems [56.129148006412855]
This paper systematically analyzes the impact of different design choices on deep-learning-based weather forecasting systems.
We study fixed-grid architectures such as UNet, fully convolutional architectures, and transformer-based models.
We propose a hybrid system that combines the strong performance of fixed-grid models with the flexibility of grid-invariant architectures.
arXiv Detail & Related papers (2024-10-09T22:25:50Z) - Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Receptive Field Refinement for Convolutional Neural Networks Reliably
Improves Predictive Performance [1.52292571922932]
We present a new approach to receptive field analysis that can yield these types of theoretical and empirical performance gains.
Our approach is able to improve ImageNet1K performance across a wide range of well-known, state-of-the-art (SOTA) model classes.
arXiv Detail & Related papers (2022-11-26T05:27:44Z) - RLFlow: Optimising Neural Network Subgraph Transformation with World
Models [0.0]
We propose a model-based agent which learns to optimise the architecture of neural networks by performing a sequence of subgraph transformations to reduce model runtime.
We show our approach can match the performance of state of the art on common convolutional networks and outperform those by up to 5% on transformer-style architectures.
arXiv Detail & Related papers (2022-05-03T11:52:54Z) - Efficient Neural Architecture Search with Performance Prediction [0.0]
We use a neural architecture search to find the best network architecture for the task at hand.
Existing NAS algorithms generally evaluate the fitness of a new architecture by fully training from scratch.
An end-to-end offline performance predictor is proposed to accelerate the evaluation of sampled architectures.
arXiv Detail & Related papers (2021-08-04T05:44:16Z) - Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling.
We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z) - Balancing Accuracy and Latency in Multipath Neural Networks [0.09668407688201358]
We use a one-shot neural architecture search model to implicitly evaluate the performance of an intractable number of neural networks.
We show that our method can accurately model the relative performance between models with different latencies and predict the performance of unseen models with good precision across different datasets.
arXiv Detail & Related papers (2021-04-25T00:05:48Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.