ScaleNet: Searching for the Model to Scale
- URL: http://arxiv.org/abs/2207.07267v1
- Date: Fri, 15 Jul 2022 03:16:43 GMT
- Title: ScaleNet: Searching for the Model to Scale
- Authors: Jiyang Xie and Xiu Su and Shan You and Zhanyu Ma and Fei Wang and Chen
Qian
- Abstract summary: We propose ScaleNet to jointly search base model and scaling strategy.
We show our scaled networks enjoy significant performance superiority on various FLOPs.
- Score: 44.05380012545087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, community has paid increasing attention on model scaling and
contributed to developing a model family with a wide spectrum of scales.
Current methods either simply resort to a one-shot NAS manner to construct a
non-structural and non-scalable model family or rely on a manual yet fixed
scaling strategy to scale an unnecessarily best base model. In this paper, we
bridge both two components and propose ScaleNet to jointly search base model
and scaling strategy so that the scaled large model can have more promising
performance. Concretely, we design a super-supernet to embody models with
different spectrum of sizes (e.g., FLOPs). Then, the scaling strategy can be
learned interactively with the base model via a Markov chain-based evolution
algorithm and generalized to develop even larger models. To obtain a decent
super-supernet, we design a hierarchical sampling strategy to enhance its
training sufficiency and alleviate the disturbance. Experimental results show
our scaled networks enjoy significant performance superiority on various FLOPs,
but with at least 2.53x reduction on search cost. Codes are available at
https://github.com/luminolx/ScaleNet.
Related papers
- Exploring Model Kinship for Merging Large Language Models [52.01652098827454]
We introduce model kinship, the degree of similarity or relatedness between Large Language Models.
We find that there is a certain relationship between model kinship and the performance gains after model merging.
We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets.
arXiv Detail & Related papers (2024-10-16T14:29:29Z) - What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model.
Previous studies have primarily focused on merging a few small models.
This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - When Do We Not Need Larger Vision Models? [55.957626371697785]
Scaling up the size of vision models has been the de facto standard to obtain more powerful visual representations.
We demonstrate the power of Scaling on Scales (S$2$), whereby a pre-trained and frozen smaller vision model can outperform larger models.
We release a Python package that can apply S$2$ on any vision model with one line of code.
arXiv Detail & Related papers (2024-03-19T17:58:39Z) - Model Compression and Efficient Inference for Large Language Models: A
Survey [20.199282252344396]
Large language models have two prominent characteristics compared to smaller models.
The most notable aspect of large models is the very high cost associated with model finetuning or training.
Large models emphasize versatility and generalization rather than performance on a single task.
arXiv Detail & Related papers (2024-02-15T06:58:30Z) - A Lightweight Feature Fusion Architecture For Resource-Constrained Crowd
Counting [3.5066463427087777]
We introduce two lightweight models to enhance the versatility of crowd-counting models.
These models maintain the same downstream architecture while incorporating two distinct backbones: MobileNet and MobileViT.
We leverage Adjacent Feature Fusion to extract diverse scale features from a Pre-Trained Model (PTM) and subsequently combine these features seamlessly.
arXiv Detail & Related papers (2024-01-11T15:13:31Z) - Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks.
Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts.
Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z) - Efficient Scale-Permuted Backbone with Learned Resource Distribution [41.45085444609275]
SpineNet has demonstrated promising results on object detection and image classification over ResNet model.
We propose a technique to combine efficient operations and compound scaling with a previously learned scale-permuted architecture.
The resulting efficient scale-permuted models outperform state-of-the-art EfficientNet-based models on object detection and achieve competitive performance on image classification and semantic segmentation.
arXiv Detail & Related papers (2020-10-22T03:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.