FiLM: Fill-in Language Models for Any-Order Generation
- URL: http://arxiv.org/abs/2310.09930v1
- Date: Sun, 15 Oct 2023 19:37:39 GMT
- Title: FiLM: Fill-in Language Models for Any-Order Generation
- Authors: Tianxiao Shen, Hao Peng, Ruoqi Shen, Yao Fu, Zaid Harchaoui, Yejin
Choi
- Abstract summary: Fill-in Language Model (FiLM) is a new language modeling approach that allows for flexible generation at any position without adhering to a specific generation order.
During inference, FiLM can seamlessly insert missing phrases, sentences, or paragraphs.
FiLM outperforms existing infilling methods that rely on left-to-right language models trained on rearranged text segments.
- Score: 71.42044325886194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models have become the backbone of today's AI systems. However,
their predominant left-to-right generation limits the use of bidirectional
context, which is essential for tasks that involve filling text in the middle.
We propose the Fill-in Language Model (FiLM), a new language modeling approach
that allows for flexible generation at any position without adhering to a
specific generation order. Its training extends the masked language modeling
objective by adopting varying mask probabilities sampled from the Beta
distribution to enhance the generative capabilities of FiLM. During inference,
FiLM can seamlessly insert missing phrases, sentences, or paragraphs, ensuring
that the outputs are fluent and are coherent with the surrounding context. In
both automatic and human evaluations, FiLM outperforms existing infilling
methods that rely on left-to-right language models trained on rearranged text
segments. FiLM is easy to implement and can be either trained from scratch or
fine-tuned from a left-to-right language model. Notably, as the model size
grows, FiLM's perplexity approaches that of strong left-to-right language
models of similar sizes, indicating FiLM's scalability and potential as a large
language model.
Related papers
- BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation [20.445336386799482]
Large language models (LLMs) have catalyzed a paradigm shift in natural language processing, yet their limited controllability poses a significant challenge for downstream applications.
We aim to address this by drawing inspiration from the neural mechanisms of the human brain, specifically Broca's and Wernicke's areas.
arXiv Detail & Related papers (2024-05-27T10:45:49Z) - Diffusion Language Models Can Perform Many Tasks with Scaling and
Instruction-Finetuning [56.03057119008865]
We show that scaling diffusion language models can effectively make them strong language learners.
We build competent diffusion language models at scale by first acquiring knowledge from massive data.
Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks.
arXiv Detail & Related papers (2023-08-23T16:01:12Z) - Extrapolating Multilingual Understanding Models as Multilingual
Generators [82.1355802012414]
This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model.
We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
arXiv Detail & Related papers (2023-05-22T15:33:21Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Bidirectional Language Models Are Also Few-shot Learners [54.37445173284831]
We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models.
We show SAP is effective on question answering and summarization.
For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models.
arXiv Detail & Related papers (2022-09-29T01:35:57Z) - Fusing Sentence Embeddings Into LSTM-based Autoregressive Language
Models [20.24851041248274]
We present an LSTM-based autoregressive language model which uses prefix embeddings (from a pretrained masked language model) via fusion.
We find that fusion helps reliably in lowering the perplexity (16.74 $rightarrow$ 15.80), which is even preserved after a transfer to a dataset from a different domain.
We also evaluate the best-performing fusion model by correlating its next word surprisal estimates with human reading times.
arXiv Detail & Related papers (2022-08-04T02:13:03Z) - TEASEL: A Transformer-Based Speech-Prefixed Language Model [4.014524824655106]
Multimodal language analysis aims to simultaneously model a speaker's words, acoustical annotations, and facial expressions.
lexicon features usually outperform other modalities because they are pre-trained on large corpora via Transformer-based models.
Despite their strong performance, training a new self-supervised learning (SSL) Transformer on any modality is not usually attainable due to insufficient data.
arXiv Detail & Related papers (2021-09-12T14:08:57Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Probabilistically Masked Language Model Capable of Autoregressive
Generation in Arbitrary Word Order [32.71489048856101]
Masked language model and autoregressive language model are two types of language models.
We propose a probabilistic masking scheme for the masked language model, which we call probabilistically masked language model (PMLM)
We prove that u-PMLM is equivalent to an autoregressive permutated language model.
arXiv Detail & Related papers (2020-04-24T07:38:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.