Related papers: HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

URL: http://arxiv.org/abs/2007.05891v1
Date: Sun, 12 Jul 2020 02:49:16 GMT
Title: HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections
Authors: Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan
Abstract summary: We propose textscHyperGrid, a new approach for highly effective multi-task learning. Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
Score: 96.64246471034195
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Achieving state-of-the-art performance on natural language understanding tasks typically relies on fine-tuning a fresh model for every task. Consequently, this approach leads to a higher overall parameter cost, along with higher technical maintenance for serving multiple models. Learning a single multi-task model that is able to do well for all the tasks has been a challenging and yet attractive proposition. In this paper, we propose \textsc{HyperGrid}, a new approach for highly effective multi-task learning. The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks. In order to construct the proposed hypernetwork, our method learns the interactions and composition between a global (task-agnostic) state and a local task-specific state. We apply our proposed \textsc{HyperGrid} on the current state-of-the-art T5 model, demonstrating strong performance across the GLUE and SuperGLUE benchmarks when using only a single multi-task model. Our method helps bridge the gap between fine-tuning and multi-task learning approaches.

Related papers

Deploying Multi-task Online Server with Large Language Model [9.118405878982383]
We present a three-stage multi-task learning framework for large language models. It involves task filtering, followed by fine-tuning on high-resource tasks, and finally fine-tuning on all tasks. Our approach, exemplified on different benchmarks, demonstrates that it is able to achieve performance comparable to the single-task method while reducing up to 90.9% of its overhead.
arXiv Detail & Related papers (2024-11-06T03:48:41Z)
HyperLoader: Integrating Hypernetwork-Based LoRA and Adapter Layers into Multi-Task Transformers for Sequence Labelling [5.955463697605461]
We present HyperLoader, a simple approach that combines different parameter-efficient fine-tuning methods in a multi-task setting. Our method combines the benefits of multi-task learning by capturing the structure of all tasks. We provide empirical evidence that HyperLoader outperforms previous approaches in most datasets.
arXiv Detail & Related papers (2024-07-01T16:00:53Z)
Parameter Efficient Multi-task Model Fusion with Partial Linearization [97.23530944186078]
We propose a novel method to improve multi-task fusion for parameter-efficient fine-tuning techniques. Our approach partially linearizes only the adapter modules and applies task arithmetic over the linearized adapters. We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model.
arXiv Detail & Related papers (2023-10-07T08:55:54Z)
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models [72.8156832931841]
Generalist models are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model. We release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction.
arXiv Detail & Related papers (2022-12-08T17:07:09Z)
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks [36.34331439747556]
We propose Polyhistor and Polyhistor-Lite to share information across different tasks with a few trainable parameters. Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using 10% of their trainable parameters.
arXiv Detail & Related papers (2022-10-07T00:25:02Z)
Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z)
Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks [37.2958914602899]
We show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks. Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task.
arXiv Detail & Related papers (2021-06-08T16:16:40Z)
A Meta-Learning Approach for Graph Representation Learning in Multi-Task Settings [7.025709586759655]
We propose a novel meta-learning strategy capable of producing multi-task node embeddings. We show that the embeddings produced by our method can be used to perform multiple tasks with comparable or higher performance than classically trained models.
arXiv Detail & Related papers (2020-12-12T08:36:47Z)
Controllable Pareto Multi-Task Learning [55.945680594691076]
A multi-task learning system aims at solving multiple related tasks at the same time. With a fixed model capacity, the tasks would be conflicted with each other, and the system usually has to make a trade-off among learning all of them together. This work proposes a novel controllable multi-task learning framework, to enable the system to make real-time trade-off control among different tasks with a single model.
arXiv Detail & Related papers (2020-10-13T11:53:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.