Multi-Stage Pre-training for Low-Resource Domain Adaptation
- URL: http://arxiv.org/abs/2010.05904v1
- Date: Mon, 12 Oct 2020 17:57:00 GMT
- Title: Multi-Stage Pre-training for Low-Resource Domain Adaptation
- Authors: Rong Zhang, Revanth Gangi Reddy, Md Arafat Sultan, Vittorio Castelli,
Anthony Ferritto, Radu Florian, Efsun Sarioglu Kayi, Salim Roukos, Avirup
Sil, Todd Ward
- Abstract summary: Current approaches directly adapt a pre-trained language model (LM) on in-domain text before fine-tuning to downstream tasks.
We show that extending the vocabulary of the LM with domain-specific terms leads to further gains.
We apply these approaches incrementally on a pre-trained Roberta-large LM and show considerable performance gain on three tasks in the IT domain.
- Score: 24.689862495171408
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning techniques are particularly useful in NLP tasks where a
sizable amount of high-quality annotated data is difficult to obtain. Current
approaches directly adapt a pre-trained language model (LM) on in-domain text
before fine-tuning to downstream tasks. We show that extending the vocabulary
of the LM with domain-specific terms leads to further gains. To a bigger
effect, we utilize structure in the unlabeled data to create auxiliary
synthetic tasks, which helps the LM transfer to downstream tasks. We apply
these approaches incrementally on a pre-trained Roberta-large LM and show
considerable performance gain on three tasks in the IT domain: Extractive
Reading Comprehension, Document Ranking and Duplicate Question Detection.
Related papers
- Layer by Layer: Uncovering Where Multi-Task Learning Happens in Instruction-Tuned Large Language Models [22.676688441884465]
Fine-tuning pre-trained large language models (LLMs) on a diverse array of tasks has become a common approach for building models.
This study investigates the task-specific information encoded in pre-trained LLMs and the effects of instruction tuning on their representations.
arXiv Detail & Related papers (2024-10-25T23:38:28Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Scalable and Domain-General Abstractive Proposition Segmentation [20.532804009152255]
We focus on the task of abstractive proposition segmentation (APS): transforming text into simple, self-contained, well-formed sentences.
We first introduce evaluation metrics for the task to measure several dimensions of quality.
We then propose a scalable, yet accurate, proposition segmentation model.
arXiv Detail & Related papers (2024-06-28T10:24:31Z) - Fine-tuning Large Language Models for Domain-specific Machine
Translation [8.439661191792897]
Large language models (LLMs) have made significant progress in machine translation (MT)
However, their potential in domain-specific MT remains under-explored.
This paper proposes a prompt-oriented fine-tuning method, denoted as LlamaIT, to effectively and efficiently fine-tune a general-purpose LLM for domain-specific MT tasks.
arXiv Detail & Related papers (2024-02-23T02:24:15Z) - Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A
Preliminary Study on Writing Assistance [60.40541387785977]
Small foundational models can display remarkable proficiency in tackling diverse tasks when fine-tuned using instruction-driven data.
In this work, we investigate a practical problem setting where the primary focus is on one or a few particular tasks rather than general-purpose instruction following.
Experimental results show that fine-tuning LLaMA on writing instruction data significantly improves its ability on writing tasks.
arXiv Detail & Related papers (2023-05-22T16:56:44Z) - Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM [31.25193238045053]
We introduce a novel method, namely GenCo, which leverages the strong generative power of large language models to assist in training a smaller language model.
In our method, an LLM plays an important role in the self-training loop of a smaller model in two important ways.
It helps crafting additional high-quality training pairs, by rewriting input texts conditioned on predicted labels.
arXiv Detail & Related papers (2023-04-24T07:35:38Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - Can You Label Less by Using Out-of-Domain Data? Active & Transfer
Learning with Few-shot Instructions [58.69255121795761]
We propose a novel Active Transfer Few-shot Instructions (ATF) approach which requires no fine-tuning.
ATF leverages the internal linguistic knowledge of pre-trained language models (PLMs) to facilitate the transfer of information.
We show that annotation of just a few target-domain samples via active learning can be beneficial for transfer, but the impact diminishes with more annotation effort.
arXiv Detail & Related papers (2022-11-21T19:03:31Z) - Hierarchical Multitask Learning Approach for BERT [0.36525095710982913]
BERT learns embeddings by solving two tasks, which are masked language model (masked LM) and the next sentence prediction (NSP)
We adopt hierarchical multitask learning approaches for BERT pre-training.
Our results show that imposing a task hierarchy in pre-training improves the performance of embeddings.
arXiv Detail & Related papers (2020-10-17T09:23:04Z) - Exploring and Predicting Transferability across NLP Tasks [115.6278033699853]
We study the transferability between 33 NLP tasks across three broad classes of problems.
Our results show that transfer learning is more beneficial than previously thought.
We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task.
arXiv Detail & Related papers (2020-05-02T09:39:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.