HyperTransformer: Model Generation for Supervised and Semi-Supervised
Few-Shot Learning
- URL: http://arxiv.org/abs/2201.04182v1
- Date: Tue, 11 Jan 2022 20:15:35 GMT
- Title: HyperTransformer: Model Generation for Supervised and Semi-Supervised
Few-Shot Learning
- Authors: Andrey Zhmoginov, Mark Sandler, Max Vladymyrov
- Abstract summary: We propose a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples.
Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal.
We extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.
- Score: 14.412066456583917
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we propose a HyperTransformer, a transformer-based model for
few-shot learning that generates weights of a convolutional neural network
(CNN) directly from support samples. Since the dependence of a small generated
CNN model on a specific task is encoded by a high-capacity transformer model,
we effectively decouple the complexity of the large task space from the
complexity of individual tasks. Our method is particularly effective for small
target CNN architectures where learning a fixed universal task-independent
embedding is not optimal and better performance is attained when the
information about the task can modulate all model parameters. For larger models
we discover that generating the last layer alone allows us to produce
competitive or better results than those obtained with state-of-the-art methods
while being end-to-end differentiable. Finally, we extend our approach to a
semi-supervised regime utilizing unlabeled samples in the support set and
further improving few-shot performance.
Related papers
- Towards Efficient Model-Heterogeneity Federated Learning for Large Models [18.008063521900702]
We introduce HeteroTune, an innovative fine-tuning framework tailored for model-heterogeneity federated learning (MHFL)
In particular, we propose a novel parameter-efficient fine-tuning structure, called FedAdapter, which employs a multi-branch cross-model aggregator.
Benefiting from the lightweight FedAdapter, our approach significantly reduces both the computational and communication overhead.
arXiv Detail & Related papers (2024-11-25T09:58:51Z) - Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models [106.94827590977337]
We propose a novel world model for Multi-Agent RL (MARL) that learns decentralized local dynamics for scalability.
We also introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation.
Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.
arXiv Detail & Related papers (2024-06-22T12:40:03Z) - Diffscaler: Enhancing the Generative Prowess of Diffusion Transformers [34.611309081801345]
This paper focuses on enabling a single pre-trained diffusion transformer model to scale across multiple datasets swiftly.
We propose DiffScaler, an efficient scaling strategy for diffusion models where we train a minimal amount of parameters to adapt to different tasks.
We find that transformer-based diffusion models significantly outperform CNN-based diffusion models methods while performing fine-tuning over smaller datasets.
arXiv Detail & Related papers (2024-04-15T17:55:43Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Majority Kernels: An Approach to Leverage Big Model Dynamics for Efficient Small Model Training [32.154166415680066]
Methods like distillation, compression or quantization help leverage the highly performant large models to induce smaller performant ones.
This paper explores the hypothesis that a single training run can simultaneously train a larger model for performance and derive a smaller model for deployment.
arXiv Detail & Related papers (2024-02-07T17:07:41Z) - Explicit Foundation Model Optimization with Self-Attentive Feed-Forward
Neural Units [4.807347156077897]
Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive when used at scale.
This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications.
arXiv Detail & Related papers (2023-11-13T17:55:07Z) - Leveraging World Model Disentanglement in Value-Based Multi-Agent
Reinforcement Learning [18.651307543537655]
We propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model.
We present experimental results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges to demonstrate that our method achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines.
arXiv Detail & Related papers (2023-09-08T22:12:43Z) - Deformable Mixer Transformer with Gating for Multi-Task Learning of
Dense Prediction [126.34551436845133]
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL)
We present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.
arXiv Detail & Related papers (2023-08-10T17:37:49Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.