Hyper-Representations for Pre-Training and Transfer Learning
- URL: http://arxiv.org/abs/2207.10951v1
- Date: Fri, 22 Jul 2022 09:01:21 GMT
- Title: Hyper-Representations for Pre-Training and Transfer Learning
- Authors: Konstantin Sch\"urholt, Boris Knyazev, Xavier Gir\'o-i-Nieto, Damian
Borth
- Abstract summary: We extend hyper-representations for generative use to sample new model weights as pre-training.
Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations.
- Score: 2.9678808525128813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning representations of neural network weights given a model zoo is an
emerging and challenging area with many potential applications from model
inspection, to neural architecture search or knowledge distillation. Recently,
an autoencoder trained on a model zoo was able to learn a hyper-representation,
which captures intrinsic and extrinsic properties of the models in the zoo. In
this work, we extend hyper-representations for generative use to sample new
model weights as pre-training. We propose layer-wise loss normalization which
we demonstrate is key to generate high-performing models and a sampling method
based on the empirical density of hyper-representations. The models generated
using our methods are diverse, performant and capable to outperform
conventional baselines for transfer learning. Our results indicate the
potential of knowledge aggregation from model zoos to new models via
hyper-representations thereby paving the avenue for novel research directions.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Learning to Jump: Thinning and Thickening Latent Counts for Generative
Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data.
We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z) - Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model.
Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z) - Hyper-Representations as Generative Models: Sampling Unseen Neural
Network Weights [2.9678808525128813]
We extend hyper-representations for generative use to sample new model weights.
Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations.
arXiv Detail & Related papers (2022-09-29T12:53:58Z) - Learning Sparse Latent Representations for Generator Model [7.467412443287767]
We present a new unsupervised learning method to enforce sparsity on the latent space for the generator model.
Our model consists of only one top-down generator network that maps the latent variable to the observed data.
arXiv Detail & Related papers (2022-09-20T18:58:24Z) - SuperAnimal pretrained pose estimation models for behavioral analysis [42.206265576708255]
Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts.
We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models.
arXiv Detail & Related papers (2022-03-14T18:46:57Z) - Entropy optimized semi-supervised decomposed vector-quantized
variational autoencoder model based on transfer learning for multiclass text
classification and generation [3.9318191265352196]
We propose a semisupervised discrete latent variable model for multi-class text classification and text generation.
The proposed model employs the concept of transfer learning for training a quantized transformer model.
Experimental results indicate that the proposed model has surpassed the state-of-the-art models remarkably.
arXiv Detail & Related papers (2021-11-10T07:07:54Z) - Neuro-Symbolic AI: An Emerging Class of AI Workloads and their
Characterization [0.9949801888214526]
Neuro-symbolic artificial intelligence is a novel area of AI research.
We describe and analyze the performance characteristics of three recent neuro-symbolic models.
arXiv Detail & Related papers (2021-09-13T17:19:59Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - Zoo-Tuning: Adaptive Transfer from a Zoo of Models [82.9120546160422]
Zoo-Tuning learns to adaptively transfer the parameters of pretrained models to the target task.
We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection.
arXiv Detail & Related papers (2021-06-29T14:09:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.