Style Locality for Controllable Generation with kNN Language Models
- URL: http://arxiv.org/abs/2311.00475v1
- Date: Wed, 1 Nov 2023 12:21:53 GMT
- Title: Style Locality for Controllable Generation with kNN Language Models
- Authors: Gilles Nawezi, Lucie Flek, Charles Welch
- Abstract summary: Nearest neighbor language models retrieve similar contexts to assist in word prediction.
The addition of locality levels allows a model to learn how to weight neighbors based on their relative location to the current text in source documents.
We show that our model is successfully able to control style and provides a better fluency-style trade-off than previous work.
- Score: 11.4179290793997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent language models have been improved by the addition of external memory.
Nearest neighbor language models retrieve similar contexts to assist in word
prediction. The addition of locality levels allows a model to learn how to
weight neighbors based on their relative location to the current text in source
documents, and have been shown to further improve model performance. Nearest
neighbor models have been explored for controllable generation but have not
examined the use of locality levels. We present a novel approach for this
purpose and evaluate it using automatic and human evaluation on politeness,
formality, supportiveness, and toxicity textual data. We find that our model is
successfully able to control style and provides a better fluency-style
trade-off than previous work.
Related papers
- A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors [50.046717886067555]
We show that when sampling corpora from an aligned language model, there exists a trade-off between the strings' average reward and average log-likelihood.
We provide a formal treatment of this phenomenon and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
arXiv Detail & Related papers (2024-06-14T17:38:21Z) - More Room for Language: Investigating the Effect of Retrieval on Language Models [3.8574940917179164]
We introduce an 'ideal retrieval' methodology to study these models in a fully controllable setting.
We conduct an evaluation to examine how retrieval augmentation affects the behavior of the underlying language model.
arXiv Detail & Related papers (2024-04-16T22:43:48Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z) - Accidental Learners: Spoken Language Identification in Multilingual
Self-Supervised Models [11.439430077017635]
We find that pre-trained speech models optimally encode language discriminatory information in lower layers.
We demonstrate that the embeddings obtained from these layers are significantly robust to classify unseen languages.
We open-source the model through the NVIDIA NeMo toolkit.
arXiv Detail & Related papers (2022-11-09T18:53:59Z) - Nearest Neighbor Language Models for Stylistic Controllable Generation [8.458066281308005]
Recent language modeling performance has been greatly improved by the use of external memory.
This memory encodes the context so that similar contexts can be recalled during decoding.
We construct and evaluate an architecture for this purpose, using corpora annotated for politeness, formality, and toxicity.
arXiv Detail & Related papers (2022-10-27T20:46:12Z) - Robust Preference Learning for Storytelling via Contrastive
Reinforcement Learning [53.92465205531759]
Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences.
We train a contrastive bi-encoder model to align stories with human critiques, building a general purpose preference model.
We further fine-tune the contrastive reward model using a prompt-learning technique to increase story generation robustness.
arXiv Detail & Related papers (2022-10-14T13:21:33Z) - Reconsidering the Past: Optimizing Hidden States in Language Models [35.7524942657169]
We present Hidden-State Optimization (HSO), a gradient-based method for improving the performance of transformer language models.
HSO computes the gradient of the log-probability the language model assigns to an evaluation text, but uses it to update the cached hidden states rather than the model parameters.
arXiv Detail & Related papers (2021-12-16T06:14:37Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.