Overcoming Barriers to Skill Injection in Language Modeling: Case Study
in Arithmetic
- URL: http://arxiv.org/abs/2211.02098v1
- Date: Thu, 3 Nov 2022 18:53:30 GMT
- Title: Overcoming Barriers to Skill Injection in Language Modeling: Case Study
in Arithmetic
- Authors: Mandar Sharma, Nikhil Muralidhar, Naren Ramakrishnan
- Abstract summary: We develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess.
Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models.
- Score: 14.618731441943847
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Through their transfer learning abilities, highly-parameterized large
pre-trained language models have dominated the NLP landscape for a multitude of
downstream language tasks. Though linguistically proficient, the inability of
these models to incorporate the learning of non-linguistic entities (numerals
and arithmetic reasoning) limits their usage for tasks that require numeric
comprehension or strict mathematical reasoning. However, as we illustrate in
this paper, building a general purpose language model that also happens to be
proficient in mathematical reasoning is not as straight-forward as training it
on a numeric dataset. In this work, we develop a novel framework that enables
language models to be mathematically proficient while retaining their
linguistic prowess. Specifically, we offer information-theoretic interventions
to overcome the catastrophic forgetting of linguistic skills that occurs while
injecting non-linguistic skills into language models.
Related papers
- Proceedings of the First International Workshop on Next-Generation Language Models for Knowledge Representation and Reasoning (NeLaMKRR 2024) [16.282850445579857]
Reasoning is an essential component of human intelligence as it plays a fundamental role in our ability to think critically.
Recent leap forward in natural language processing, with the emergence of language models based on transformers, is hinting at the possibility that these models exhibit reasoning abilities.
Despite ongoing discussions about what reasoning is in language models, it is still not easy to pin down to what extent these models are actually capable of reasoning.
arXiv Detail & Related papers (2024-10-07T02:31:47Z) - Diffusion Language Models Can Perform Many Tasks with Scaling and
Instruction-Finetuning [56.03057119008865]
We show that scaling diffusion language models can effectively make them strong language learners.
We build competent diffusion language models at scale by first acquiring knowledge from massive data.
Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks.
arXiv Detail & Related papers (2023-08-23T16:01:12Z) - Arithmetic with Language Models: from Memorization to Computation [3.077668143048211]
This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data.
We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing.
arXiv Detail & Related papers (2023-08-02T13:58:37Z) - Learning Non-linguistic Skills without Sacrificing Linguistic
Proficiency [14.618731441943847]
Non-linguistic skill injection leads to catastrophic forgetting of core linguistic skills.
Our model outperforms the state-of-the-art both on injected non-linguistic skills and on linguistic knowledge retention.
arXiv Detail & Related papers (2023-05-14T20:57:11Z) - On Robustness of Prompt-based Semantic Parsing with Large Pre-trained
Language Model: An Empirical Study on Codex [48.588772371355816]
This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, codex.
Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples.
arXiv Detail & Related papers (2023-01-30T13:21:00Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Testing the Ability of Language Models to Interpret Figurative Language [69.59943454934799]
Figurative and metaphorical language are commonplace in discourse.
It remains an open question to what extent modern language models can interpret nonliteral phrases.
We introduce Fig-QA, a Winograd-style nonliteral language understanding task.
arXiv Detail & Related papers (2022-04-26T23:42:22Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Language Models are not Models of Language [0.0]
Transfer learning has enabled large deep learning neural networks trained on the language modeling task to vastly improve performance.
We argue that the term language model is misleading because deep learning models are not theoretical models of language.
arXiv Detail & Related papers (2021-12-13T22:39:46Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.