A Model Zoo of Vision Transformers
- URL: http://arxiv.org/abs/2504.10231v1
- Date: Mon, 14 Apr 2025 13:52:26 GMT
- Title: A Model Zoo of Vision Transformers
- Authors: Damian Falk, Léo Meynent, Florence Pfammatter, Konstantin Schürholt, Damian Borth,
- Abstract summary: We introduce the first model zoo of vision transformers (ViT)<n>To better represent recent training approaches, we develop a new blueprint for model zoo generation that encompasses both pre-training and fine-tuning steps.<n>They are carefully generated with a large span of generating factors, and their diversity is validated using a thorough choice of weight-space and behavioral metrics.
- Score: 6.926413609535758
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The availability of large, structured populations of neural networks - called 'model zoos' - has led to the development of a multitude of downstream tasks ranging from model analysis, to representation learning on model weights or generative modeling of neural network parameters. However, existing model zoos are limited in size and architecture and neglect the transformer, which is among the currently most successful neural network architectures. We address this gap by introducing the first model zoo of vision transformers (ViT). To better represent recent training approaches, we develop a new blueprint for model zoo generation that encompasses both pre-training and fine-tuning steps, and publish 250 unique models. They are carefully generated with a large span of generating factors, and their diversity is validated using a thorough choice of weight-space and behavioral metrics. To further motivate the utility of our proposed dataset, we suggest multiple possible applications grounded in both extensive exploratory experiments and a number of examples from the existing literature. By extending previous lines of similar work, our model zoo allows researchers to push their model population-based methods from the small model regime to state-of-the-art architectures. We make our model zoo available at github.com/ModelZoos/ViTModelZoo.
Related papers
- A Model Zoo on Phase Transitions in Neural Networks [12.510848963038992]
Weight Space Learning methods require populations of trained models as datasets for development and evaluation.
Existing collections of models - called model zoos' - are unstructured or follow a rudimentary definition of diversity.
We combine the idea of model zoos' with phase information to create a controlled notion of diversity in populations.
arXiv Detail & Related papers (2025-04-25T05:01:52Z) - The Impact of Model Zoo Size and Composition on Weight Space Learning [8.11780615053558]
Re-using trained neural network models is a common strategy to reduce training cost and transfer knowledge.<n>Weight space learning is a promising new field to re-use populations of pre-trained models for future tasks.<n>We propose a modification to a common weight space learning method to accommodate training on heterogeneous populations of models.
arXiv Detail & Related papers (2025-04-14T11:54:06Z) - Knowledge Fusion By Evolving Weights of Language Models [5.354527640064584]
This paper examines the approach of integrating multiple models into a unified model.
We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms.
arXiv Detail & Related papers (2024-06-18T02:12:34Z) - Learning the 3D Fauna of the Web [70.01196719128912]
We develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly.
One crucial bottleneck of modeling animals is the limited availability of training data.
We show that prior category-specific attempts fail to generalize to rare species with limited training images.
arXiv Detail & Related papers (2024-01-04T18:32:48Z) - Model Zoos: A Dataset of Diverse Populations of Neural Network Models [2.7167743929103363]
We publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models.
The dataset can be found at www.modelzoos.cc.
arXiv Detail & Related papers (2022-09-29T13:20:42Z) - Hyper-Representations as Generative Models: Sampling Unseen Neural
Network Weights [2.9678808525128813]
We extend hyper-representations for generative use to sample new model weights.
Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations.
arXiv Detail & Related papers (2022-09-29T12:53:58Z) - Advancing Plain Vision Transformer Towards Remote Sensing Foundation
Model [97.9548609175831]
We resort to plain vision transformers with about 100 million parameters and make the first attempt to propose large vision models customized for remote sensing tasks.
Specifically, to handle the large image size and objects of various orientations in RS images, we propose a new rotated varied-size window attention.
Experiments on detection tasks demonstrate the superiority of our model over all state-of-the-art models, achieving 81.16% mAP on the DOTA-V1.0 dataset.
arXiv Detail & Related papers (2022-08-08T09:08:40Z) - Hyper-Representations for Pre-Training and Transfer Learning [2.9678808525128813]
We extend hyper-representations for generative use to sample new model weights as pre-training.
Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations.
arXiv Detail & Related papers (2022-07-22T09:01:21Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - Zoo-Tuning: Adaptive Transfer from a Zoo of Models [82.9120546160422]
Zoo-Tuning learns to adaptively transfer the parameters of pretrained models to the target task.
We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection.
arXiv Detail & Related papers (2021-06-29T14:09:45Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z) - A linearized framework and a new benchmark for model selection for
fine-tuning [112.20527122513668]
Fine-tuning from a collection of models pre-trained on different domains is emerging as a technique to improve test accuracy in the low-data regime.
We introduce two new baselines for model selection -- Label-Gradient and Label-Feature Correlation.
Our benchmark highlights accuracy gain with model zoo compared to fine-tuning Imagenet models.
arXiv Detail & Related papers (2021-01-29T21:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.