Related papers: ScaleNet: Searching for the Model to Scale

ScaleNet: Searching for the Model to Scale

URL: http://arxiv.org/abs/2207.07267v1
Date: Fri, 15 Jul 2022 03:16:43 GMT
Title: ScaleNet: Searching for the Model to Scale
Authors: Jiyang Xie and Xiu Su and Shan You and Zhanyu Ma and Fei Wang and Chen Qian
Abstract summary: We propose ScaleNet to jointly search base model and scaling strategy. We show our scaled networks enjoy significant performance superiority on various FLOPs.
Score: 44.05380012545087
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, community has paid increasing attention on model scaling and contributed to developing a model family with a wide spectrum of scales. Current methods either simply resort to a one-shot NAS manner to construct a non-structural and non-scalable model family or rely on a manual yet fixed scaling strategy to scale an unnecessarily best base model. In this paper, we bridge both two components and propose ScaleNet to jointly search base model and scaling strategy so that the scaled large model can have more promising performance. Concretely, we design a super-supernet to embody models with different spectrum of sizes (e.g., FLOPs). Then, the scaling strategy can be learned interactively with the base model via a Markov chain-based evolution algorithm and generalized to develop even larger models. To obtain a decent super-supernet, we design a hierarchical sampling strategy to enhance its training sufficiency and alleviate the disturbance. Experimental results show our scaled networks enjoy significant performance superiority on various FLOPs, but with at least 2.53x reduction on search cost. Codes are available at https://github.com/luminolx/ScaleNet.

Related papers

SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Current state-of-the-art methods focus on training innovative architectural designs on confined datasets. We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z)
Exploring Model Kinship for Merging Large Language Models [52.01652098827454]
We introduce model kinship, the degree of similarity or relatedness between Large Language Models. We find that there is a certain relationship between model kinship and the performance gains after model merging. We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets.
arXiv Detail & Related papers (2024-10-16T14:29:29Z)
What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model. Previous studies have primarily focused on merging a few small models. This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z)
PLeaS -- Merging Models with Permutations and Least Squares [43.17620198572947]
We propose a new two-step algorithm to merge models -- termed PLeaS -- which relaxes constraints. PLeaS partially matches nodes in each layer by maximizing alignment. We also demonstrate how our method can be extended to address a challenging scenario where no data is available from the fine-tuning domains.
arXiv Detail & Related papers (2024-07-02T17:24:04Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
When Do We Not Need Larger Vision Models? [55.957626371697785]
Scaling up the size of vision models has been the de facto standard to obtain more powerful visual representations. We demonstrate the power of Scaling on Scales (S$2$), whereby a pre-trained and frozen smaller vision model can outperform larger models. We release a Python package that can apply S$2$ on any vision model with one line of code.
arXiv Detail & Related papers (2024-03-19T17:58:39Z)
Model Compression and Efficient Inference for Large Language Models: A Survey [20.199282252344396]
Large language models have two prominent characteristics compared to smaller models. The most notable aspect of large models is the very high cost associated with model finetuning or training. Large models emphasize versatility and generalization rather than performance on a single task.
arXiv Detail & Related papers (2024-02-15T06:58:30Z)
A Lightweight Feature Fusion Architecture For Resource-Constrained Crowd Counting [3.5066463427087777]
We introduce two lightweight models to enhance the versatility of crowd-counting models. These models maintain the same downstream architecture while incorporating two distinct backbones: MobileNet and MobileViT. We leverage Adjacent Feature Fusion to extract diverse scale features from a Pre-Trained Model (PTM) and subsequently combine these features seamlessly.
arXiv Detail & Related papers (2024-01-11T15:13:31Z)
Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks. Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts. Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z)
Efficient Scale-Permuted Backbone with Learned Resource Distribution [41.45085444609275]
SpineNet has demonstrated promising results on object detection and image classification over ResNet model. We propose a technique to combine efficient operations and compound scaling with a previously learned scale-permuted architecture. The resulting efficient scale-permuted models outperform state-of-the-art EfficientNet-based models on object detection and achieve competitive performance on image classification and semantic segmentation.
arXiv Detail & Related papers (2020-10-22T03:59:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.