HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable
Hyper Projections
- URL: http://arxiv.org/abs/2007.05891v1
- Date: Sun, 12 Jul 2020 02:49:16 GMT
- Title: HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable
Hyper Projections
- Authors: Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan
- Abstract summary: We propose textscHyperGrid, a new approach for highly effective multi-task learning.
Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
- Score: 96.64246471034195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Achieving state-of-the-art performance on natural language understanding
tasks typically relies on fine-tuning a fresh model for every task.
Consequently, this approach leads to a higher overall parameter cost, along
with higher technical maintenance for serving multiple models. Learning a
single multi-task model that is able to do well for all the tasks has been a
challenging and yet attractive proposition. In this paper, we propose
\textsc{HyperGrid}, a new approach for highly effective multi-task learning.
The proposed approach is based on a decomposable hypernetwork that learns
grid-wise projections that help to specialize regions in weight matrices for
different tasks. In order to construct the proposed hypernetwork, our method
learns the interactions and composition between a global (task-agnostic) state
and a local task-specific state. We apply our proposed \textsc{HyperGrid} on
the current state-of-the-art T5 model, demonstrating strong performance across
the GLUE and SuperGLUE benchmarks when using only a single multi-task model.
Our method helps bridge the gap between fine-tuning and multi-task learning
approaches.
Related papers
- HyperLoader: Integrating Hypernetwork-Based LoRA and Adapter Layers into Multi-Task Transformers for Sequence Labelling [5.955463697605461]
We present HyperLoader, a simple approach that combines different parameter-efficient fine-tuning methods in a multi-task setting.
Our method combines the benefits of multi-task learning by capturing the structure of all tasks.
We provide empirical evidence that HyperLoader outperforms previous approaches in most datasets.
arXiv Detail & Related papers (2024-07-01T16:00:53Z) - On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion [23.63688816017186]
Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge.
We propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task.
Our method closes the performance gap by 96.4% in single-task scenarios and by 86.3% in multi-task scenarios.
arXiv Detail & Related papers (2024-06-17T03:07:41Z) - EMA-Net: Efficient Multitask Affinity Learning for Dense Scene
Predictions [7.01633634930651]
We introduce the Efficient Multitask Affinity Learning Network (EMA-Net)
EMA-Net adeptly captures local, global, and cross-task interactions using our novel Cross-Task Affinity Learning (CTAL) module.
Our results show that we achieve state-of-the-art MTL performance for CNN-based decoder-focused models.
arXiv Detail & Related papers (2024-01-20T05:31:47Z) - Parameter Efficient Multi-task Model Fusion with Partial Linearization [97.23530944186078]
We propose a novel method to improve multi-task fusion for parameter-efficient fine-tuning techniques.
Our approach partially linearizes only the adapter modules and applies task arithmetic over the linearized adapters.
We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model.
arXiv Detail & Related papers (2023-10-07T08:55:54Z) - OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
Models [72.8156832931841]
Generalist models are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model.
We release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction.
arXiv Detail & Related papers (2022-12-08T17:07:09Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints.
We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z) - Parameter-efficient Multi-task Fine-tuning for Transformers via Shared
Hypernetworks [37.2958914602899]
We show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks.
Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task.
arXiv Detail & Related papers (2021-06-08T16:16:40Z) - Controllable Pareto Multi-Task Learning [55.945680594691076]
A multi-task learning system aims at solving multiple related tasks at the same time.
With a fixed model capacity, the tasks would be conflicted with each other, and the system usually has to make a trade-off among learning all of them together.
This work proposes a novel controllable multi-task learning framework, to enable the system to make real-time trade-off control among different tasks with a single model.
arXiv Detail & Related papers (2020-10-13T11:53:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.