HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable
Hyper Projections
- URL: http://arxiv.org/abs/2007.05891v1
- Date: Sun, 12 Jul 2020 02:49:16 GMT
- Title: HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable
Hyper Projections
- Authors: Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan
- Abstract summary: We propose textscHyperGrid, a new approach for highly effective multi-task learning.
Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
- Score: 96.64246471034195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Achieving state-of-the-art performance on natural language understanding
tasks typically relies on fine-tuning a fresh model for every task.
Consequently, this approach leads to a higher overall parameter cost, along
with higher technical maintenance for serving multiple models. Learning a
single multi-task model that is able to do well for all the tasks has been a
challenging and yet attractive proposition. In this paper, we propose
\textsc{HyperGrid}, a new approach for highly effective multi-task learning.
The proposed approach is based on a decomposable hypernetwork that learns
grid-wise projections that help to specialize regions in weight matrices for
different tasks. In order to construct the proposed hypernetwork, our method
learns the interactions and composition between a global (task-agnostic) state
and a local task-specific state. We apply our proposed \textsc{HyperGrid} on
the current state-of-the-art T5 model, demonstrating strong performance across
the GLUE and SuperGLUE benchmarks when using only a single multi-task model.
Our method helps bridge the gap between fine-tuning and multi-task learning
approaches.
Related papers
- Deploying Multi-task Online Server with Large Language Model [9.118405878982383]
We present a three-stage multi-task learning framework for large language models.
It involves task filtering, followed by fine-tuning on high-resource tasks, and finally fine-tuning on all tasks.
Our approach, exemplified on different benchmarks, demonstrates that it is able to achieve performance comparable to the single-task method while reducing up to 90.9% of its overhead.
arXiv Detail & Related papers (2024-11-06T03:48:41Z) - HyperLoader: Integrating Hypernetwork-Based LoRA and Adapter Layers into Multi-Task Transformers for Sequence Labelling [5.955463697605461]
We present HyperLoader, a simple approach that combines different parameter-efficient fine-tuning methods in a multi-task setting.
Our method combines the benefits of multi-task learning by capturing the structure of all tasks.
We provide empirical evidence that HyperLoader outperforms previous approaches in most datasets.
arXiv Detail & Related papers (2024-07-01T16:00:53Z) - Parameter Efficient Multi-task Model Fusion with Partial Linearization [97.23530944186078]
We propose a novel method to improve multi-task fusion for parameter-efficient fine-tuning techniques.
Our approach partially linearizes only the adapter modules and applies task arithmetic over the linearized adapters.
We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model.
arXiv Detail & Related papers (2023-10-07T08:55:54Z) - OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
Models [72.8156832931841]
Generalist models are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model.
We release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction.
arXiv Detail & Related papers (2022-12-08T17:07:09Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints.
We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z) - Parameter-efficient Multi-task Fine-tuning for Transformers via Shared
Hypernetworks [37.2958914602899]
We show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks.
Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task.
arXiv Detail & Related papers (2021-06-08T16:16:40Z) - A Meta-Learning Approach for Graph Representation Learning in Multi-Task
Settings [7.025709586759655]
We propose a novel meta-learning strategy capable of producing multi-task node embeddings.
We show that the embeddings produced by our method can be used to perform multiple tasks with comparable or higher performance than classically trained models.
arXiv Detail & Related papers (2020-12-12T08:36:47Z) - Controllable Pareto Multi-Task Learning [55.945680594691076]
A multi-task learning system aims at solving multiple related tasks at the same time.
With a fixed model capacity, the tasks would be conflicted with each other, and the system usually has to make a trade-off among learning all of them together.
This work proposes a novel controllable multi-task learning framework, to enable the system to make real-time trade-off control among different tasks with a single model.
arXiv Detail & Related papers (2020-10-13T11:53:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.