Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning
- URL: http://arxiv.org/abs/2401.03563v1
- Date: Sun, 7 Jan 2024 18:12:20 GMT
- Title: Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning
- Authors: Yingqian Min, Kun Zhou, Dawei Gao, Wayne Xin Zhao, He Hu, and Yaliang
Li
- Abstract summary: We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
- Score: 85.66907881270785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, multi-task instruction tuning has been applied into sentence
representation learning, which endows the capability of generating specific
representations with the guidance of task instruction, exhibiting strong
generalization ability on new tasks. However, these methods mostly neglect the
potential interference problems across different tasks and instances, which may
affect the training and convergence of the model. To address it, we propose a
data curriculum method, namely Data-CUBE, that arranges the orders of all the
multi-task data for training, to minimize the interference risks from the two
views. In the task level, we aim to find the optimal task order to minimize the
total cross-task interference risk, which is exactly the traveling salesman
problem, hence we utilize a simulated annealing algorithm to find its solution.
In the instance level, we measure the difficulty of all instances per task,
then divide them into the easy-to-difficult mini-batches for training.
Experiments on MTEB sentence representation evaluation tasks show that our
approach can boost the performance of state-of-the-art methods. Our code and
data are publicly available at the link:
\url{https://github.com/RUCAIBox/Data-CUBE}.
Related papers
- Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - Active Instruction Tuning: Improving Cross-Task Generalization by
Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions.
How to select new tasks to improve the performance and generalizability of IT models remains an open question.
We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - TaskMix: Data Augmentation for Meta-Learning of Spoken Intent
Understanding [0.0]
We show that a state-of-the-art data augmentation method worsens this problem of overfitting when the task diversity is low.
We propose a simple method, TaskMix, which synthesizes new tasks by linearly interpolating existing tasks.
We show that TaskMix outperforms baselines, alleviates overfitting when task diversity is low, and does not degrade performance even when it is high.
arXiv Detail & Related papers (2022-09-26T00:37:40Z) - Enhancing Continual Learning with Global Prototypes: Counteracting
Negative Representation Drift [16.177180198865848]
Continual learning aims to learn a sequence of tasks over time, with data distributions shifting from one task to another.
Some negative representation drift can result in catastrophic forgetting, by causing the locally learned class prototypes and data representations to correlate poorly across tasks.
We propose a method that finds global prototypes to guide the learning, and learns data representations with the regularization of the self-supervised information.
arXiv Detail & Related papers (2022-05-24T16:41:30Z) - Learning Multiple Dense Prediction Tasks from Partially Annotated Data [41.821234589075445]
We look at jointly learning of multiple dense prediction tasks on partially annotated data, which we call multi-task partially-supervised learning.
We propose a multi-task training procedure that successfully leverages task relations to supervise its multi-task learning when data is partially annotated.
We rigorously demonstrate that our proposed method effectively exploits the images with unlabelled tasks and outperforms existing semi-supervised learning approaches and related methods on three standard benchmarks.
arXiv Detail & Related papers (2021-11-29T19:03:12Z) - Variational Multi-Task Learning with Gumbel-Softmax Priors [105.22406384964144]
Multi-task learning aims to explore task relatedness to improve individual tasks.
We propose variational multi-task learning (VMTL), a general probabilistic inference framework for learning multiple related tasks.
arXiv Detail & Related papers (2021-11-09T18:49:45Z) - Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks.
In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand.
We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.