Related papers: Conditional Language Learning with Context

Conditional Language Learning with Context

URL: http://arxiv.org/abs/2406.01976v1
Date: Tue, 4 Jun 2024 05:22:24 GMT
Title: Conditional Language Learning with Context
Authors: Xiao Zhang, Miao Li, Ji Wu,
Abstract summary: We propose a simple modification to causal language modeling called conditional finetuning. We show that a context can "explain away" certain corpus statistics and make the model avoid learning them.
Score: 19.708303468664088
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models can learn sophisticated language understanding skills from fitting raw text. They also unselectively learn useless corpus statistics and biases, especially during finetuning on domain-specific corpora. In this paper, we propose a simple modification to causal language modeling called conditional finetuning, which performs language modeling conditioned on a context. We show that a context can "explain away" certain corpus statistics and make the model avoid learning them. In this fashion, conditional finetuning achieves selective learning from a corpus, learning knowledge useful for downstream tasks while avoiding learning useless corpus statistics like topic biases. This selective learning effect leads to less forgetting and better stability-plasticity tradeoff in domain finetuning, potentially benefitting lifelong learning with language models.

Related papers

Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies. We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z)
Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True Distribution [4.01799362940916]
We present a setup for training, evaluating and interpreting neural language models, that uses artificial, language-like data. The data is generated using a massive probabilistic grammar, that is itself derived from a large natural language corpus. With access to the underlying true source, our results show striking differences and outcomes in learning dynamics between different classes of words.
arXiv Detail & Related papers (2023-10-23T12:03:01Z)
Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z)
ALERT: Adapting Language Models to Reasoning Tasks [43.8679673685468]
ALERT is a benchmark and suite of analyses for assessing language models' reasoning ability. ALERT provides a test bed to asses any language model on fine-grained reasoning skills. We find that language models learn more reasoning skills during finetuning stage compared to pretraining state.
arXiv Detail & Related papers (2022-12-16T05:15:41Z)
Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions. Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z)
Linking Emergent and Natural Languages via Corpus Transfer [98.98724497178247]
We propose a novel way to establish a link by corpus transfer between emergent languages and natural languages. Our approach showcases non-trivial transfer benefits for two different tasks -- language modeling and image captioning. We also introduce a novel metric to predict the transferability of an emergent language by translating emergent messages to natural language captions grounded on the same images.
arXiv Detail & Related papers (2022-03-24T21:24:54Z)
Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages. We infer this distribution from a sample of typologically diverse training languages. We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)
A Visuospatial Dataset for Naturalistic Verb Learning [18.654373173232205]
We introduce a new dataset for training and evaluating grounded language models. Our data is collected within a virtual reality environment and is designed to emulate the quality of language data to which a pre-verbal child is likely to have access. We use the collected data to compare several distributional semantics models for verb learning.
arXiv Detail & Related papers (2020-10-28T20:47:13Z)
Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size. We propose a fully compositional output embedding layer for language models. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.