Can We Scale Transformers to Predict Parameters of Diverse ImageNet
Models?
- URL: http://arxiv.org/abs/2303.04143v2
- Date: Wed, 31 May 2023 15:08:46 GMT
- Title: Can We Scale Transformers to Predict Parameters of Diverse ImageNet
Models?
- Authors: Boris Knyazev, Doha Hwang, Simon Lacoste-Julien
- Abstract summary: We release a single neural network that can predict high quality parameters of other neural networks.
We are able to boost training of diverse ImageNet models available in PyTorch.
When transferred to other datasets, models with predicted parameters also converge faster and reach competitive final performance.
- Score: 23.668513148189344
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretraining a neural network on a large dataset is becoming a cornerstone in
machine learning that is within the reach of only a few communities with
large-resources. We aim at an ambitious goal of democratizing pretraining.
Towards that goal, we train and release a single neural network that can
predict high quality ImageNet parameters of other neural networks. By using
predicted parameters for initialization we are able to boost training of
diverse ImageNet models available in PyTorch. When transferred to other
datasets, models initialized with predicted parameters also converge faster and
reach competitive final performance.
Related papers
- Efficient Training with Denoised Neural Weights [65.14892033932895]
This work takes a novel step towards building a weight generator to synthesize the neural weights for initialization.
We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights.
By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds.
arXiv Detail & Related papers (2024-07-16T17:59:42Z) - Learning to Generate Parameters of ConvNets for Unseen Image Data [39.35619721100205]
ConvNets depend heavily on large amounts of image data and resort to an iterative optimization algorithm to learn network parameters.
We propose a new training paradigm and formulate parameter learning of ConvNets into a prediction task.
We show that our proposed method achieves good efficacy for unseen image datasets on two kinds of settings.
arXiv Detail & Related papers (2023-10-18T10:26:18Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with
Point-to-Pixel Prompting [94.11915008006483]
We propose a novel Point-to-Pixel prompting for point cloud analysis.
Our method attains 89.3% accuracy on the hardest setting of ScanObjectNN.
Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Code.
arXiv Detail & Related papers (2022-08-04T17:59:03Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Parameter Prediction for Unseen Deep Architectures [23.79630072083828]
We study if we can use deep learning to directly predict parameters by exploiting the past knowledge of training other networks.
We propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU.
The proposed model achieves surprisingly good performance on unseen and diverse networks.
arXiv Detail & Related papers (2021-10-25T16:52:33Z) - Point-Cloud Deep Learning of Porous Media for Permeability Prediction [0.0]
We propose a novel deep learning framework for predicting permeability of porous media from their digital images.
We model the boundary between solid matrix and pore spaces as point clouds and feed them as inputs to a neural network based on the PointNet architecture.
arXiv Detail & Related papers (2021-07-18T22:59:21Z) - Learning to Learn Parameterized Classification Networks for Scalable
Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change.
We employ meta learners to generate convolutional weights of main networks for various input scales.
We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z) - Multi-task pre-training of deep neural networks for digital pathology [8.74883469030132]
We first assemble and transform many digital pathology datasets into a pool of 22 classification tasks and almost 900k images.
We show that our models used as feature extractors either improve significantly over ImageNet pre-trained models or provide comparable performance.
arXiv Detail & Related papers (2020-05-05T08:50:17Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.