XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
- URL: http://arxiv.org/abs/2106.04563v1
- Date: Tue, 8 Jun 2021 17:49:33 GMT
- Title: XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
- Authors: Subhabrata Mukherjee, Ahmed Hassan Awadallah, Jianfeng Gao
- Abstract summary: We develop a new task-agnostic distillation framework XtremeDistilTransformers.
We study the transferability of several source tasks, augmentation resources and model architecture for distillation.
- Score: 80.18830380517753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While deep and large pre-trained models are the state-of-the-art for various
natural language processing tasks, their huge size poses significant challenges
for practical uses in resource constrained settings. Recent works in knowledge
distillation propose task-agnostic as well as task-specific methods to compress
these models, with task-specific ones often yielding higher compression rate.
In this work, we develop a new task-agnostic distillation framework
XtremeDistilTransformers that leverages the advantage of task-specific methods
for learning a small universal model that can be applied to arbitrary tasks and
languages. To this end, we study the transferability of several source tasks,
augmentation resources and model architecture for distillation. We evaluate our
model performance on multiple tasks, including the General Language
Understanding Evaluation (GLUE) benchmark, SQuAD question answering dataset and
a massive multi-lingual NER dataset with 41 languages.
Related papers
- On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion [23.63688816017186]
Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge.
We propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task.
Our method closes the performance gap by 96.4% in single-task scenarios and by 86.3% in multi-task scenarios.
arXiv Detail & Related papers (2024-06-17T03:07:41Z) - SpeechVerse: A Large-scale Generalizable Audio Language Model [38.67969337605572]
SpeechVerse is a robust multi-task training and curriculum learning framework.
It combines pre-trained speech and text foundation models via a small set of learnable parameters.
Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.
arXiv Detail & Related papers (2024-05-14T03:33:31Z) - UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions [64.50935101415776]
We build a single model that jointly performs various spoken language understanding (SLU) tasks.
We demonstrate the efficacy of our single multi-task learning model "UniverSLU" for 12 speech classification and sequence generation task types spanning 17 datasets and 9 languages.
arXiv Detail & Related papers (2023-10-04T17:10:23Z) - Task-Based MoE for Multitask Multilingual Machine Translation [58.20896429151824]
Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications.
In this work, we design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters.
arXiv Detail & Related papers (2023-08-30T05:41:29Z) - FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue [70.65782786401257]
This work explores conversational task transfer by introducing FETA: a benchmark for few-sample task transfer in open-domain dialogue.
FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer.
We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs.
arXiv Detail & Related papers (2022-05-12T17:59:00Z) - Attribution-based Task-specific Pruning for Multi-task Language Models [19.106042468549187]
Multi-task language models show outstanding performance for various natural language understanding tasks with only a single model.
We propose a novel training-free task-specific pruning method for multi-task language models.
arXiv Detail & Related papers (2022-05-09T10:12:08Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Multitask Prompted Training Enables Zero-Shot Task Generalization [70.12770442071657]
We develop a system for mapping general natural language tasks into a human-readable prompted form.
We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks.
The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size.
arXiv Detail & Related papers (2021-10-15T17:08:57Z) - Modelling Latent Skills for Multitask Language Generation [15.126163032403811]
We present a generative model for multitask conditional language generation.
Our guiding hypothesis is that a shared set of latent skills underlies many disparate language generation tasks.
We instantiate this task embedding space as a latent variable in a latent variable sequence-to-sequence model.
arXiv Detail & Related papers (2020-02-21T20:39:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.