Related papers: Weighted Training for Cross-Task Learning

Weighted Training for Cross-Task Learning

URL: http://arxiv.org/abs/2105.14095v1
Date: Fri, 28 May 2021 20:27:02 GMT
Title: Weighted Training for Cross-Task Learning
Authors: Shuxiao Chen, Koby Crammer, Hangfeng He, Dan Roth, Weijie J. Su
Abstract summary: We introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning. We show that TAWT is easy to implement, is computationally efficient, requires little hyper parameter tuning, and enjoys non-asymptotic learning-theoretic guarantees. As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning.
Score: 71.94908559469475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning based on minimizing a representation-based task distance between the source and target tasks. We show that TAWT is easy to implement, is computationally efficient, requires little hyperparameter tuning, and enjoys non-asymptotic learning-theoretic guarantees. The effectiveness of TAWT is corroborated through extensive experiments with BERT on four sequence tagging tasks in natural language processing (NLP), including part-of-speech (PoS) tagging, chunking, predicate detection, and named entity recognition (NER). As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning, such as the choice of the source data and the impact of fine-tuning

Related papers

Task-Oriented Low-Label Semantic Communication With Self-Supervised Learning [67.06363342414397]
Task-oriented semantic communication enhances transmission efficiency by conveying semantic information rather than exact messages.<n>Deep learning (DL)-based semantic communication can effectively cultivate the essential semantic knowledge for semantic extraction, transmission, and interpretation.<n>We propose a self-supervised learning-based semantic communication framework (SLSCom) to enhance task inference performance.
arXiv Detail & Related papers (2025-05-26T13:06:18Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
PEMT: Multi-Task Correlation Guided Mixture-of-Experts Enables Parameter-Efficient Transfer Learning [28.353530290015794]
We propose PEMT, a novel parameter-efficient fine-tuning framework based on multi-task transfer learning. We conduct experiments on a broad range of tasks over 17 datasets.
arXiv Detail & Related papers (2024-02-23T03:59:18Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning [20.177260510548535]
We propose the Allocation & Regularization (PAR), which adaptively select an appropriate strategy for each task from parameter allocation and regularization based on its learning difficulty. Our method is scalable and significantly reduces the model's redundancy while improving the model's performance.
arXiv Detail & Related papers (2023-04-11T15:38:21Z)
Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models [20.173322408302134]
compositional fine-tuning is an approach based on explicitly decomposing a target task into component tasks. We show that CFT outperforms end-to-end learning even with equal amounts of data.
arXiv Detail & Related papers (2022-10-23T03:22:34Z)
Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning. We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z)
Variational Multi-Task Learning with Gumbel-Softmax Priors [105.22406384964144]
Multi-task learning aims to explore task relatedness to improve individual tasks. We propose variational multi-task learning (VMTL), a general probabilistic inference framework for learning multiple related tasks.
arXiv Detail & Related papers (2021-11-09T18:49:45Z)
Zero-Shot Information Extraction as a Unified Text-to-Triple Translation [56.01830747416606]
We cast a suite of information extraction tasks into a text-to-triple translation framework. We formalize the task as a translation between task-specific input text and output triples. We study the zero-shot performance of this framework on open information extraction.
arXiv Detail & Related papers (2021-09-23T06:54:19Z)
Pretext Tasks selection for multitask self-supervised speech representation learning [23.39079406674442]
This paper introduces a method to select a group of pretext tasks among a set of candidates. Experiments conducted on speaker recognition and automatic speech recognition validate our approach.
arXiv Detail & Related papers (2021-07-01T16:36:29Z)
MC-BERT: Efficient Language Pre-Training via a Meta Controller [96.68140474547602]
Large-scale pre-training is computationally expensive. ELECTRA, an early attempt to accelerate pre-training, trains a discriminative model that predicts whether each input token was replaced by a generator. We propose a novel meta-learning framework, MC-BERT, to achieve better efficiency and effectiveness.
arXiv Detail & Related papers (2020-06-10T09:22:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.