Related papers: Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

URL: http://arxiv.org/abs/2303.04143v2
Date: Wed, 31 May 2023 15:08:46 GMT
Title: Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
Authors: Boris Knyazev, Doha Hwang, Simon Lacoste-Julien
Abstract summary: We release a single neural network that can predict high quality parameters of other neural networks. We are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models with predicted parameters also converge faster and reach competitive final performance.
Score: 23.668513148189344
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance.

Related papers

Neural Metamorphosis [72.88137795439407]
This paper introduces a new learning paradigm termed Neural Metamorphosis (NeuMeta), which aims to build self-morphable neural networks. NeuMeta directly learns the continuous weight manifold of neural networks. It sustains full-size performance even at a 75% compression rate.
arXiv Detail & Related papers (2024-10-10T14:49:58Z)
SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection [3.2586315449885106]
We propose a novel encoder-decoder-style neural network called SODAWideNet++ designed explicitly for Salient Object Detection. Inspired by the vision transformers ability to attain a global receptive field from the initial stages, we introduce the Attention Guided Long Range Feature Extraction (AGLRFE) module. In contrast to the current paradigm of ImageNet pre-training, we modify 118K annotated images from the COCO semantic segmentation dataset by binarizing the annotations to pre-train the proposed model end-to-end.
arXiv Detail & Related papers (2024-08-29T15:51:06Z)
Efficient Training with Denoised Neural Weights [65.14892033932895]
This work takes a novel step towards building a weight generator to synthesize the neural weights for initialization. We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights. By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds.
arXiv Detail & Related papers (2024-07-16T17:59:42Z)
Learning to Generate Parameters of ConvNets for Unseen Image Data [36.68392191824203]
ConvNets depend heavily on large amounts of image data and resort to an iterative optimization algorithm to learn network parameters. We propose a new training paradigm and formulate the parameter learning of ConvNets into a prediction task. We show that our proposed method achieves good efficacy for unseen image datasets on two kinds of settings.
arXiv Detail & Related papers (2023-10-18T10:26:18Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting [94.11915008006483]
We propose a novel Point-to-Pixel prompting for point cloud analysis. Our method attains 89.3% accuracy on the hardest setting of ScanObjectNN. Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Code.
arXiv Detail & Related papers (2022-08-04T17:59:03Z)
Parameter Prediction for Unseen Deep Architectures [23.79630072083828]
We study if we can use deep learning to directly predict parameters by exploiting the past knowledge of training other networks. We propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks.
arXiv Detail & Related papers (2021-10-25T16:52:33Z)
Point-Cloud Deep Learning of Porous Media for Permeability Prediction [0.0]
We propose a novel deep learning framework for predicting permeability of porous media from their digital images. We model the boundary between solid matrix and pore spaces as point clouds and feed them as inputs to a neural network based on the PointNet architecture.
arXiv Detail & Related papers (2021-07-18T22:59:21Z)
Learning to Learn Parameterized Classification Networks for Scalable Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change. We employ meta learners to generate convolutional weights of main networks for various input scales. We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.