Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task
Generalization
- URL: http://arxiv.org/abs/2210.09175v1
- Date: Mon, 17 Oct 2022 15:25:24 GMT
- Title: Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task
Generalization
- Authors: Yuxian Gu, Pei Ke, Xiaoyan Zhu, Minlie Huang
- Abstract summary: We propose Unlabeled Data Augmented Instruction Tuning (UDIT) to take better advantage of the instructions during instruction learning.
We conduct extensive experiments to show UDIT's effectiveness in various scenarios of tasks and datasets.
- Score: 68.91386402390403
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training language models to learn from human instructions for zero-shot
cross-task generalization has attracted much attention in NLP communities.
Recently, instruction tuning (IT), which fine-tunes a pre-trained language
model on a massive collection of tasks described via human-craft instructions,
has been shown effective in instruction learning for unseen tasks. However, IT
relies on a large amount of human-annotated samples, which restricts its
generalization. Unlike labeled data, unlabeled data are often massive and cheap
to obtain. In this work, we study how IT can be improved with unlabeled data.
We first empirically explore the IT performance trends versus the number of
labeled data, instructions, and training tasks. We find it critical to enlarge
the number of training instructions, and the instructions can be underutilized
due to the scarcity of labeled data. Then, we propose Unlabeled Data Augmented
Instruction Tuning (UDIT) to take better advantage of the instructions during
IT by constructing pseudo-labeled data from unlabeled plain texts. We conduct
extensive experiments to show UDIT's effectiveness in various scenarios of
tasks and datasets. We also comprehensively analyze the key factors of UDIT to
investigate how to better improve IT with unlabeled data. The code is publicly
available at https://github.com/thu-coai/UDIT.
Related papers
- Less is More: High-value Data Selection for Visual Instruction Tuning [127.38740043393527]
We propose a high-value data selection approach TIVE, to eliminate redundancy within the visual instruction data and reduce the training cost.
Our approach using only about 15% data can achieve comparable average performance to the full-data fine-tuned model across eight benchmarks.
arXiv Detail & Related papers (2024-03-14T16:47:25Z) - A Regularization-based Transfer Learning Method for Information
Extraction via Instructed Graph Decoder [29.242560023747252]
We propose a regularization-based transfer learning method for IE (TIE) via an instructed graph decoder.
Specifically, we first construct an instruction pool for datasets from all well-known IE tasks, and then present an instructed graph decoder.
In this way, the common knowledge shared with existing datasets can be learned and transferred to a new dataset with new labels.
arXiv Detail & Related papers (2024-03-01T13:04:12Z) - Language Semantic Graph Guided Data-Efficient Learning [10.039953846594805]
We introduce a Language Semantic Graph (LSG) which is constructed from labels manifest as natural language descriptions.
An auxiliary graph neural network is trained to extract high-level semantic relations and then used to guide the training of the primary model.
Our in-depth analysis shows that the LSG method also expedites the training process.
arXiv Detail & Related papers (2023-11-15T08:54:57Z) - Using Self-Supervised Pretext Tasks for Active Learning [7.214674613451605]
We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative.
The pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and grouped into batches by their pretext task losses.
In each iteration, the main task model is used to sample the most uncertain data in a batch to be annotated.
arXiv Detail & Related papers (2022-01-19T07:58:06Z) - Training Dynamic based data filtering may not work for NLP datasets [0.0]
We study the applicability of the Area Under the Margin (AUM) metric to identify mislabelled examples in NLP datasets.
We find that mislabelled samples can be filtered using the AUM metric in NLP datasets but it also removes a significant number of correctly labeled points.
arXiv Detail & Related papers (2021-09-19T18:50:45Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - Adaptive Self-training for Few-shot Neural Sequence Labeling [55.43109437200101]
We develop techniques to address the label scarcity challenge for neural sequence labeling models.
Self-training serves as an effective mechanism to learn from large amounts of unlabeled data.
meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
arXiv Detail & Related papers (2020-10-07T22:29:05Z) - Self-training Improves Pre-training for Natural Language Understanding [63.78927366363178]
We study self-training as another way to leverage unlabeled data through semi-supervised learning.
We introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data.
Our approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks.
arXiv Detail & Related papers (2020-10-05T17:52:25Z) - DeCLUTR: Deep Contrastive Learning for Unsupervised Textual
Representations [4.36561468436181]
We present DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations.
Our approach closes the performance gap between unsupervised and supervised pretraining for universal sentence encoders.
Our code and pretrained models are publicly available and can be easily adapted to new domains or used to embed unseen text.
arXiv Detail & Related papers (2020-06-05T20:00:28Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.