Curricular Transfer Learning for Sentence Encoded Tasks
- URL: http://arxiv.org/abs/2308.01849v1
- Date: Thu, 3 Aug 2023 16:18:19 GMT
- Title: Curricular Transfer Learning for Sentence Encoded Tasks
- Authors: Jader Martins Camboim de S\'a, Matheus Ferraroni Sanches, Rafael Roque
de Souza, J\'ulio Cesar dos Reis, Leandro Aparecido Villas
- Abstract summary: This article proposes a sequence of pre-training steps guided by "data hacking" and grammar analysis.
In our experiments, we acquire a considerable improvement from our method compared to other known pre-training approaches for the MultiWoZ task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning language models in a downstream task is the standard approach for
many state-of-the-art methodologies in the field of NLP. However, when the
distribution between the source task and target task drifts, \textit{e.g.},
conversational environments, these gains tend to be diminished. This article
proposes a sequence of pre-training steps (a curriculum) guided by "data
hacking" and grammar analysis that allows further gradual adaptation between
pre-training distributions. In our experiments, we acquire a considerable
improvement from our method compared to other known pre-training approaches for
the MultiWoZ task.
Related papers
- Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Rethinking Data Augmentation for Low-Resource Neural Machine
Translation: A Multi-Task Learning Approach [0.0]
Data augmentation (DA) techniques may be used for generating additional training samples when the available parallel data are scarce.
We present a multi-task DA approach in which we generate new sentence pairs with transformations.
We show consistent improvements over the baseline and over DA methods aiming at extending the support of the empirical data distribution.
arXiv Detail & Related papers (2021-09-08T13:39:30Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.