Language modeling via stochastic processes
- URL: http://arxiv.org/abs/2203.11370v2
- Date: Thu, 11 May 2023 15:08:37 GMT
- Title: Language modeling via stochastic processes
- Authors: Rose E Wang, Esin Durmus, Noah Goodman, Tatsunori Hashimoto
- Abstract summary: Modern language models can generate high-quality short texts, but often meander or are incoherent when generating longer texts.
Recent work in self-supervised learning suggests that models can learn good latent representations via contrastive learning.
We propose one approach for leveraging constrastive representations, which we call Time Control.
- Score: 30.796382023812022
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern language models can generate high-quality short texts. However, they
often meander or are incoherent when generating longer texts. These issues
arise from the next-token-only language modeling objective. Recent work in
self-supervised learning suggests that models can learn good latent
representations via contrastive learning, which can be effective for
discriminative tasks. Our work analyzes the application of contrastive
representations for generative tasks, like long text generation. We propose one
approach for leveraging constrastive representations, which we call Time
Control (TC). TC first learns a contrastive representation of the target text
domain, then generates text by decoding from these representations. Compared to
domain-specific methods and fine-tuning GPT2 across a variety of text domains,
TC performs competitively to methods specific for learning sentence
representations on discourse coherence. On long text generation settings, TC
preserves the text structure both in terms of ordering (up to $+15\%$ better)
and text length consistency (up to $+90\%$ better).
Related papers
- Few-Shot Spoken Language Understanding via Joint Speech-Text Models [18.193191170754744]
Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations.
We leverage such shared representations to address the persistent challenge of limited data availability in spoken language understanding tasks.
By employing a pre-trained speech-text model, we find that models fine-tuned on text can be effectively transferred to speech testing data.
arXiv Detail & Related papers (2023-10-09T17:59:21Z) - Augmenting text for spoken language understanding with Large Language
Models [13.240782495441275]
We show how to use transcript-semantic parse data (unpaired text) without corresponding speech.
Experiments show that unpaired text from existing and new domains improves performance by 2% and 30% in absolute Exact Match (EM) respectively.
We propose to prompt Large Language Models (LLMs) to generate unpaired text for existing and new domains.
arXiv Detail & Related papers (2023-09-17T22:25:34Z) - Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection.
Our approach achieves better generation quality according to both automatic and human evaluations.
Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z) - Vision-Language Pre-Training for Boosting Scene Text Detectors [57.08046351495244]
We specifically adapt vision-language joint learning for scene text detection.
We propose to learn contextualized, joint representations through vision-language pre-training.
The pre-trained model is able to produce more informative representations with richer semantics.
arXiv Detail & Related papers (2022-04-29T03:53:54Z) - How much do language models copy from their training data? Evaluating
linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text.
Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions?
We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z) - Continuous Entailment Patterns for Lexical Inference in Context [4.581468205348204]
A pretrained language model (PLM) with textual patterns has been shown to help in both zero- and few-shot settings.
For zero-shot performance, it makes sense to design patterns that closely resemble the text seen during self-supervised pretraining because the model has never seen anything else.
Supervised training allows for more flexibility. If we allow for tokens outside the PLM's vocabulary, patterns can be adapted more flexibly to a PLM's idiosyncrasies.
arXiv Detail & Related papers (2021-09-08T14:57:00Z) - Long Text Generation by Modeling Sentence-Level and Discourse-Level
Coherence [59.51720326054546]
We propose a long text generation model, which can represent the prefix sentences at sentence level and discourse level in the decoding process.
Our model can generate more coherent texts than state-of-the-art baselines.
arXiv Detail & Related papers (2021-05-19T07:29:08Z) - Visual Grounding Strategies for Text-Only Natural Language Processing [1.2183405753834562]
multimodal extensions of BERT allow a joint modeling of texts and images that lead to state-of-the-art results on multimodal tasks such as Visual Question Answering.
Here, we leverage multimodal modeling for purely textual tasks with the expectation that the multimodal pretraining provides a grounding that can improve text processing accuracy.
A first type of strategy, referred to as it transferred grounding consists in applying multimodal models to text-only tasks using a placeholder to replace image input.
The second one, which we call it associative grounding, harnesses image retrieval to match texts with related images during both
arXiv Detail & Related papers (2021-03-25T16:03:00Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - Progressive Generation of Long Text with Pretrained Language Models [83.62523163717448]
Large-scale language models (LMs) pretrained on massive corpora of text, such as GPT-2, are powerful open-domain text generators.
It is still challenging for such models to generate coherent long passages of text, especially when the models are fine-tuned to the target domain on a small corpus.
We propose a simple but effective method of generating text in a progressive manner, inspired by generating images from low to high resolution.
arXiv Detail & Related papers (2020-06-28T21:23:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.