On Supernet Transfer Learning for Effective Task Adaptation
- URL: http://arxiv.org/abs/2407.20279v3
- Date: Sat, 12 Jul 2025 22:57:05 GMT
- Title: On Supernet Transfer Learning for Effective Task Adaptation
- Authors: Prabhant Singh, Joaquin Vanschoren,
- Abstract summary: Transfer learning is a practical alternative with the rise of ever-larger pretrained models.<n>We introduce supernet transfer learning, which parameterize both weight and architecture priors, and efficiently finetune both to new tasks.<n>We demonstrate that supernet transfer learning does not only drastically speed up the discovery of optimal models, but will also find better models than running NAS from scratch.
- Score: 4.19475889117731
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Architecture Search (NAS) methods have been shown to outperform hand-designed models and help to democratize AI. However, NAS methods often start from scratch with each new task, making them computationally expensive and limiting their applicability. Transfer learning is a practical alternative with the rise of ever-larger pretrained models. However, it is also bound to the architecture of the pretrained model, which inhibits proper adaptation of the architecture to different tasks, leading to suboptimal (and excessively large) models. We address both challenges at once by introducing a novel and practical method to \textit{transfer supernets}, which parameterize both weight and architecture priors, and efficiently finetune both to new tasks. This enables supernet transfer learning as a replacement for traditional transfer learning that also finetunes model architectures to new tasks. Through extensive experiments across multiple image classification tasks, we demonstrate that supernet transfer learning does not only drastically speed up the discovery of optimal models (3 to 5 times faster on average), but will also find better models than running NAS from scratch. The added model flexibility also increases the robustness of transfer learning, yielding positive transfer to even very different target datasets, especially with multi-dataset pretraining.
Related papers
- Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning.
Our approach can compress the number of parameters by more than 70%.
We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z) - SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss
Landscape [14.550053893504764]
We introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS.
In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up"
We present SiGeo, a proxy founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy.
arXiv Detail & Related papers (2023-11-22T05:25:24Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets [55.2118691522524]
Distillation-aware Neural Architecture Search (DaNAS) aims to search for an optimal student architecture.
We propose a distillation-aware meta accuracy prediction model, DaSS (Distillation-aware Student Search), which can predict a given architecture's final performances on a dataset.
arXiv Detail & Related papers (2023-05-26T14:00:35Z) - $\Delta$-Patching: A Framework for Rapid Adaptation of Pre-trained
Convolutional Networks without Base Performance Loss [71.46601663956521]
Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time.
We propose $Delta$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies.
Our experiments show that $Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained.
arXiv Detail & Related papers (2023-03-26T16:39:44Z) - NASiam: Efficient Representation Learning using Neural Architecture
Search for Siamese Networks [76.8112416450677]
Siamese networks are one of the most trending methods to achieve self-supervised visual representation learning (SSL)
NASiam is a novel approach that uses for the first time differentiable NAS to improve the multilayer perceptron projector and predictor (encoder/predictor pair)
NASiam reaches competitive performance in both small-scale (i.e., CIFAR-10/CIFAR-100) and large-scale (i.e., ImageNet) image classification datasets while costing only a few GPU hours.
arXiv Detail & Related papers (2023-01-31T19:48:37Z) - Improving the Generalization of Supervised Models [30.264601433216246]
In this paper, we propose a supervised learning setup that leverages the best of both worlds.
We show that these three improvements lead to a more favorable trade-off between the IN1K training task and 13 transfer tasks.
arXiv Detail & Related papers (2022-06-30T15:43:51Z) - Warm-starting DARTS using meta-learning [4.035753155957698]
Neural architecture search (NAS) has shown great promise in the field of automated machine learning (AutoML)
We present a meta-learning framework to warm-start Differentiable architecture search (DARTS)
arXiv Detail & Related papers (2022-05-12T20:40:26Z) - How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets.
In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset.
We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z) - Conceptual Expansion Neural Architecture Search (CENAS) [1.3464152928754485]
We present an approach called Conceptual Expansion Neural Architecture Search (CENAS)
It combines a sample-efficient, computational creativity-inspired transfer learning approach with neural architecture search.
It finds models faster than naive architecture search via transferring existing weights to approximate the parameters of the new model.
arXiv Detail & Related papers (2021-10-07T02:29:26Z) - Pi-NAS: Improving Neural Architecture Search by Reducing Supernet
Training Consistency Shift [128.32670289503025]
Recently proposed neural architecture search (NAS) methods co-train billions of architectures in a supernet and estimate their potential accuracy.
The ranking correlation between the architectures' predicted accuracy and their actual capability is incorrect, which causes the existing NAS methods' dilemma.
We attribute this ranking correlation problem to the supernet training consistency shift, including feature shift and parameter shift.
We address these two shifts simultaneously using a nontrivial supernet-Pi model, called Pi-NAS.
arXiv Detail & Related papers (2021-08-22T09:08:48Z) - Neural Architecture Search on ImageNet in Four GPU Hours: A
Theoretically Inspired Perspective [88.39981851247727]
We propose a novel framework called training-free neural architecture search (TE-NAS)
TE-NAS ranks architectures by analyzing the spectrum of the neural tangent kernel (NTK) and the number of linear regions in the input space.
We show that: (1) these two measurements imply the trainability and expressivity of a neural network; (2) they strongly correlate with the network's test accuracy.
arXiv Detail & Related papers (2021-02-23T07:50:44Z) - Direct Federated Neural Architecture Search [0.0]
We present an effective approach for direct federated NAS which is hardware agnostic, computationally lightweight, and a one-stage method to search for ready-to-deploy neural network models.
Our results show an order of magnitude reduction in resource consumption while edging out prior art in accuracy.
arXiv Detail & Related papers (2020-10-13T08:11:35Z) - Revisiting Neural Architecture Search [0.0]
We propose a novel approach to search for the complete neural network without much human effort and is a step closer towards AutoML-nirvana.
Our method starts from a complete graph mapped to a neural network and searches for the connections and operations by balancing the exploration and exploitation of the search space.
arXiv Detail & Related papers (2020-10-12T13:57:30Z) - Binarized Neural Architecture Search for Efficient Object Recognition [120.23378346337311]
Binarized neural architecture search (BNAS) produces extremely compressed models to reduce huge computational cost on embedded devices for edge computing.
An accuracy of $96.53%$ vs. $97.22%$ is achieved on the CIFAR-10 dataset, but with a significantly compressed model, and a $40%$ faster search than the state-of-the-art PC-DARTS.
arXiv Detail & Related papers (2020-09-08T15:51:23Z) - Do Adversarially Robust ImageNet Models Transfer Better? [102.09335596483695]
adversarially robust models often perform better than their standard-trained counterparts when used for transfer learning.
Our results are consistent with (and in fact, add to) recent hypotheses stating that robustness leads to improved feature representations.
arXiv Detail & Related papers (2020-07-16T17:42:40Z) - Neural Architecture Transfer [20.86857986471351]
Existing approaches require one complete search for each deployment specification of hardware or objective.
We propose Neural Architecture Transfer (NAT) to overcome this limitation.
NAT is designed to efficiently generate task-specific custom models that are competitive under multiple conflicting objectives.
arXiv Detail & Related papers (2020-05-12T15:30:36Z) - GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet [63.96959854429752]
GreedyNAS is easy-to-follow, and experimental results on ImageNet dataset indicate that it can achieve better Top-1 accuracy under same search space and FLOPs or latency level.
By searching on a larger space, our GreedyNAS can also obtain new state-of-the-art architectures.
arXiv Detail & Related papers (2020-03-25T06:54:10Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.