Does QA-based intermediate training help fine-tuning language models for
text classification?
- URL: http://arxiv.org/abs/2112.15051v1
- Date: Thu, 30 Dec 2021 13:30:25 GMT
- Title: Does QA-based intermediate training help fine-tuning language models for
text classification?
- Authors: Shiwei Zhang and Xiuzhen Zhang
- Abstract summary: It is found that intermediate training based on high-level inference tasks such as Question Answering (QA) can improve the performance of some language models for target tasks.
In this paper, we experimented on eight tasks for single-sequence classification and eight tasks for sequence-pair classification using two base and two compact language models.
Our experiments show that QA-based intermediate training generates varying transfer performance across different language models, except for similar QA tasks.
- Score: 12.023861154677203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning pre-trained language models for downstream tasks has become a
norm for NLP. Recently it is found that intermediate training based on
high-level inference tasks such as Question Answering (QA) can improve the
performance of some language models for target tasks. However it is not clear
if intermediate training generally benefits various language models. In this
paper, using the SQuAD-2.0 QA task for intermediate training for target text
classification tasks, we experimented on eight tasks for single-sequence
classification and eight tasks for sequence-pair classification using two base
and two compact language models. Our experiments show that QA-based
intermediate training generates varying transfer performance across different
language models, except for similar QA tasks.
Related papers
- T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - Generative Language Models for Paragraph-Level Question Generation [79.31199020420827]
Powerful generative models have led to recent progress in question generation (QG)
It is difficult to measure advances in QG research since there are no standardized resources that allow a uniform comparison among approaches.
We introduce QG-Bench, a benchmark for QG that unifies existing question answering datasets by converting them to a standard QG setting.
arXiv Detail & Related papers (2022-10-08T10:24:39Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - A Hierarchical Model for Spoken Language Recognition [29.948719321162883]
Spoken language recognition ( SLR) refers to the automatic process used to determine the language present in a speech sample.
We propose a novel hierarchical approach were two PLDA models are trained, one to generate scores for clusters of highly related languages and a second one to generate scores conditional to each cluster.
We show that this hierarchical approach consistently outperforms the non-hierarchical one for detection of highly related languages.
arXiv Detail & Related papers (2022-01-04T22:10:36Z) - CLIN-X: pre-trained language models and a study on cross-task transfer
for concept extraction in the clinical domain [22.846469609263416]
We introduce the pre-trained CLIN-X (Clinical XLM-R) language models and show how CLIN-X outperforms other pre-trained transformer models.
Our studies reveal stable model performance despite a lack of annotated data with improvements of up to 47 F1 points when only 250 labeled sentences are available.
Our results highlight the importance of specialized language models as CLIN-X for concept extraction in non-standard domains.
arXiv Detail & Related papers (2021-12-16T10:07:39Z) - Switch Point biased Self-Training: Re-purposing Pretrained Models for
Code-Switching [44.034300203700234]
Code-switching is a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities.
We propose a self training method to repurpose the existing pretrained models using a switch-point bias.
Our approach performs well on both tasks by reducing the gap between the switch point performance.
arXiv Detail & Related papers (2021-11-01T19:42:08Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - English Intermediate-Task Training Improves Zero-Shot Cross-Lingual
Transfer Too [42.95481834479966]
Intermediatetask training improves model performance substantially on language understanding tasks in monolingual English settings.
We evaluate intermediate-task transfer in a zero-shot cross-lingual setting on the XTREME benchmark.
Using our best intermediate-task models for each target task, we obtain a 5.4 point improvement over XLM-R Large.
arXiv Detail & Related papers (2020-05-26T20:17:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.