Pretraining with Contrastive Sentence Objectives Improves Discourse
Performance of Language Models
- URL: http://arxiv.org/abs/2005.10389v1
- Date: Wed, 20 May 2020 23:21:43 GMT
- Title: Pretraining with Contrastive Sentence Objectives Improves Discourse
Performance of Language Models
- Authors: Dan Iter, Kelvin Guu, Larry Lansing, Dan Jurafsky
- Abstract summary: We propose CONPONO, an inter-sentence objective for pretraining language models that models discourse coherence and the distance between sentences.
On the discourse representation benchmark DiscoEval, our model improves over the previous state-of-the-art by up to 13%.
We also show that CONPONO yields gains of 2%-6% absolute even for tasks that do not explicitly evaluate discourse.
- Score: 29.40992909208733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent models for unsupervised representation learning of text have employed
a number of techniques to improve contextual word representations but have put
little focus on discourse-level representations. We propose CONPONO, an
inter-sentence objective for pretraining language models that models discourse
coherence and the distance between sentences. Given an anchor sentence, our
model is trained to predict the text k sentences away using a sampled-softmax
objective where the candidates consist of neighboring sentences and sentences
randomly sampled from the corpus. On the discourse representation benchmark
DiscoEval, our model improves over the previous state-of-the-art by up to 13%
and on average 4% absolute across 7 tasks. Our model is the same size as
BERT-Base, but outperforms the much larger BERT- Large model and other more
recent approaches that incorporate discourse. We also show that CONPONO yields
gains of 2%-6% absolute even for tasks that do not explicitly evaluate
discourse: textual entailment (RTE), common sense reasoning (COPA) and reading
comprehension (ReCoRD).
Related papers
- Integrating Self-supervised Speech Model with Pseudo Word-level Targets
from Visually-grounded Speech Model [57.78191634042409]
We propose Pseudo-Word HuBERT (PW-HuBERT), a framework that integrates pseudo word-level targets into the training process.
Our experimental results on four spoken language understanding (SLU) benchmarks suggest the superiority of our model in capturing semantic information.
arXiv Detail & Related papers (2024-02-08T16:55:21Z) - Few-Shot Spoken Language Understanding via Joint Speech-Text Models [18.193191170754744]
Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations.
We leverage such shared representations to address the persistent challenge of limited data availability in spoken language understanding tasks.
By employing a pre-trained speech-text model, we find that models fine-tuned on text can be effectively transferred to speech testing data.
arXiv Detail & Related papers (2023-10-09T17:59:21Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Sentence Representation Learning with Generative Objective rather than
Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z) - SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text
Joint Pre-Training [33.02912456062474]
We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech.
We demonstrate that incorporating both speech and text data during pre-training can significantly improve downstream quality on CoVoST2 speech translation.
arXiv Detail & Related papers (2021-10-20T00:59:36Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - Augmenting BERT-style Models with Predictive Coding to Improve
Discourse-level Representations [20.855686009404703]
We propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn discourse-level representations.
Our proposed approach is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network.
arXiv Detail & Related papers (2021-09-10T00:45:28Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Labeling Explicit Discourse Relations using Pre-trained Language Models [0.0]
State-of-the-art models achieve slightly above 45% of F-score by using hand-crafted features.
We find that the pre-trained language models, when finetuned, are powerful enough to replace the linguistic features.
This is the first time when a model outperforms the knowledge intensive models without employing any linguistic features.
arXiv Detail & Related papers (2020-06-21T17:18:01Z) - Toward Better Storylines with Sentence-Level Language Models [54.91921545103256]
We propose a sentence-level language model which selects the next sentence in a story from a finite set of fluent alternatives.
We demonstrate the effectiveness of our approach with state-of-the-art accuracy on the unsupervised Story Cloze task.
arXiv Detail & Related papers (2020-05-11T16:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.