Related papers: FiLM: Fill-in Language Models for Any-Order Generation

FiLM: Fill-in Language Models for Any-Order Generation

URL: http://arxiv.org/abs/2310.09930v1
Date: Sun, 15 Oct 2023 19:37:39 GMT
Title: FiLM: Fill-in Language Models for Any-Order Generation
Authors: Tianxiao Shen, Hao Peng, Ruoqi Shen, Yao Fu, Zaid Harchaoui, Yejin Choi
Abstract summary: Fill-in Language Model (FiLM) is a new language modeling approach that allows for flexible generation at any position without adhering to a specific generation order. During inference, FiLM can seamlessly insert missing phrases, sentences, or paragraphs. FiLM outperforms existing infilling methods that rely on left-to-right language models trained on rearranged text segments.
Score: 71.42044325886194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language models have become the backbone of today's AI systems. However, their predominant left-to-right generation limits the use of bidirectional context, which is essential for tasks that involve filling text in the middle. We propose the Fill-in Language Model (FiLM), a new language modeling approach that allows for flexible generation at any position without adhering to a specific generation order. Its training extends the masked language modeling objective by adopting varying mask probabilities sampled from the Beta distribution to enhance the generative capabilities of FiLM. During inference, FiLM can seamlessly insert missing phrases, sentences, or paragraphs, ensuring that the outputs are fluent and are coherent with the surrounding context. In both automatic and human evaluations, FiLM outperforms existing infilling methods that rely on left-to-right language models trained on rearranged text segments. FiLM is easy to implement and can be either trained from scratch or fine-tuned from a left-to-right language model. Notably, as the model size grows, FiLM's perplexity approaches that of strong left-to-right language models of similar sizes, indicating FiLM's scalability and potential as a large language model.

Related papers

Chunk-Distilled Language Modeling [25.238256586953487]
Chunk-Distilled Language Modeling (CD-LM) is an approach to text generation that addresses two challenges in current large language models (LLMs) Our method combines deep network-based LLMs with a straightforward retrieval module, which allows the generation of multi-token text chunks at a single decoding step.
arXiv Detail & Related papers (2024-12-31T08:32:15Z)
Liquid: Language Models are Scalable and Unified Multi-modal Generators [112.71734051183726]
Liquid is an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model. For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks.
arXiv Detail & Related papers (2024-12-05T16:48:16Z)
BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation [20.445336386799482]
Large language models (LLMs) have catalyzed a paradigm shift in natural language processing, yet their limited controllability poses a significant challenge for downstream applications. We aim to address this by drawing inspiration from the neural mechanisms of the human brain, specifically Broca's and Wernicke's areas.
arXiv Detail & Related papers (2024-05-27T10:45:49Z)
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning [56.03057119008865]
We show that scaling diffusion language models can effectively make them strong language learners. We build competent diffusion language models at scale by first acquiring knowledge from massive data. Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks.
arXiv Detail & Related papers (2023-08-23T16:01:12Z)
Extrapolating Multilingual Understanding Models as Multilingual Generators [82.1355802012414]
This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model. We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
arXiv Detail & Related papers (2023-05-22T15:33:21Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
Bidirectional Language Models Are Also Few-shot Learners [54.37445173284831]
We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. We show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models.
arXiv Detail & Related papers (2022-09-29T01:35:57Z)
Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models [20.24851041248274]
We present an LSTM-based autoregressive language model which uses prefix embeddings (from a pretrained masked language model) via fusion. We find that fusion helps reliably in lowering the perplexity (16.74 $rightarrow$ 15.80), which is even preserved after a transfer to a dataset from a different domain. We also evaluate the best-performing fusion model by correlating its next word surprisal estimates with human reading times.
arXiv Detail & Related papers (2022-08-04T02:13:03Z)
TEASEL: A Transformer-Based Speech-Prefixed Language Model [4.014524824655106]
Multimodal language analysis aims to simultaneously model a speaker's words, acoustical annotations, and facial expressions. lexicon features usually outperform other modalities because they are pre-trained on large corpora via Transformer-based models. Despite their strong performance, training a new self-supervised learning (SSL) Transformer on any modality is not usually attainable due to insufficient data.
arXiv Detail & Related papers (2021-09-12T14:08:57Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order [32.71489048856101]
Masked language model and autoregressive language model are two types of language models. We propose a probabilistic masking scheme for the masked language model, which we call probabilistically masked language model (PMLM) We prove that u-PMLM is equivalent to an autoregressive permutated language model.
arXiv Detail & Related papers (2020-04-24T07:38:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.