ConvFiT: Conversational Fine-Tuning of Pretrained Language Models
- URL: http://arxiv.org/abs/2109.10126v1
- Date: Tue, 21 Sep 2021 12:16:56 GMT
- Title: ConvFiT: Conversational Fine-Tuning of Pretrained Language Models
- Authors: Ivan Vuli\'c, Pei-Hao Su, Sam Coope, Daniela Gerz, Pawe{\l}
Budzianowski, I\~nigo Casanueva, Nikola Mrk\v{s}i\'c, Tsung-Hsien Wen
- Abstract summary: Transformer-based language models (LMs) pretrained on large text collections are proven to store a wealth of semantic knowledge.
We propose ConvFiT, a simple and efficient two-stage procedure which turns any pretrained LM into a universal conversational encoder.
- Score: 42.7160113690317
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Transformer-based language models (LMs) pretrained on large text collections
are proven to store a wealth of semantic knowledge. However, 1) they are not
effective as sentence encoders when used off-the-shelf, and 2) thus typically
lag behind conversationally pretrained (e.g., via response selection) encoders
on conversational tasks such as intent detection (ID). In this work, we propose
ConvFiT, a simple and efficient two-stage procedure which turns any pretrained
LM into a universal conversational encoder (after Stage 1 ConvFiT-ing) and
task-specialised sentence encoder (after Stage 2). We demonstrate that 1)
full-blown conversational pretraining is not required, and that LMs can be
quickly transformed into effective conversational encoders with much smaller
amounts of unannotated data; 2) pretrained LMs can be fine-tuned into
task-specialised sentence encoders, optimised for the fine-grained semantics of
a particular task. Consequently, such specialised sentence encoders allow for
treating ID as a simple semantic similarity task based on interpretable nearest
neighbours retrieval. We validate the robustness and versatility of the ConvFiT
framework with such similarity-based inference on the standard ID evaluation
sets: ConvFiT-ed LMs achieve state-of-the-art ID performance across the board,
with particular gains in the most challenging, few-shot setups.
Related papers
- Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster
Fine-tuning with Less Labels in Speech Processing [66.92823764664206]
We take a sober look into pre-trained speech encoders and rewire their representation space without requiring task-specific labels.
Our experiments on 6 speech processing tasks, exhibit a significant convergence speedup during task fine-tuning as well as consistent task improvement.
arXiv Detail & Related papers (2022-10-24T08:27:09Z) - Efficient Long-Text Understanding with Short-Text Models [38.8375175429553]
SLED is a simple approach for processing long sequences that re-uses and leverages battle-tested short-text pretrained LMs.
We partition the input into overlapping chunks, encode each with a short-text LM encoder and use the pretrained decoder to fuse information across chunks.
We find that SLED is competitive with specialized models that are up to 50x larger and require a dedicated and expensive pretraining step.
arXiv Detail & Related papers (2022-08-01T11:14:39Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - Trans-Encoder: Unsupervised sentence-pair modelling through self- and
mutual-distillations [22.40667024030858]
Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient.
Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance.
Trans-Encoder combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders.
arXiv Detail & Related papers (2021-09-27T14:06:47Z) - Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained
Models into Speech Translation Encoders [30.160261563657947]
Speech-to-translation data is scarce; pre-training is promising in end-to-end Speech Translation.
We propose a Stacked.
Acoustic-and-Textual (SATE) method for speech translation.
Our encoder begins with processing the acoustic sequence as usual, but later behaves more like an.
MT encoder for a global representation of the input sequence.
arXiv Detail & Related papers (2021-05-12T16:09:53Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - Scheduled Sampling in Vision-Language Pretraining with Decoupled
Encoder-Decoder Network [99.03895740754402]
We propose a two-stream decoupled design of encoder-decoder structure, in which two decoupled cross-modal encoder and decoder are involved.
As an alternative, we propose a primary scheduled sampling strategy that mitigates such discrepancy via pretraining encoder-decoder in a two-pass manner.
arXiv Detail & Related papers (2021-01-27T17:36:57Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.