Related papers: Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic

Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic

URL: http://arxiv.org/abs/2211.02098v1
Date: Thu, 3 Nov 2022 18:53:30 GMT
Title: Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic
Authors: Mandar Sharma, Nikhil Muralidhar, Naren Ramakrishnan
Abstract summary: We develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess. Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models.
Score: 14.618731441943847
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Through their transfer learning abilities, highly-parameterized large pre-trained language models have dominated the NLP landscape for a multitude of downstream language tasks. Though linguistically proficient, the inability of these models to incorporate the learning of non-linguistic entities (numerals and arithmetic reasoning) limits their usage for tasks that require numeric comprehension or strict mathematical reasoning. However, as we illustrate in this paper, building a general purpose language model that also happens to be proficient in mathematical reasoning is not as straight-forward as training it on a numeric dataset. In this work, we develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess. Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models.

Related papers

IOLBENCH: Benchmarking LLMs on Linguistic Reasoning [8.20398036986024]
We introduce IOLBENCH, a novel benchmark derived from International Linguistics Olympiad (IOL) problems. This dataset encompasses diverse problems testing syntax, morphology, phonology, and semantics. We find that even the most advanced models struggle to handle the intricacies of linguistic complexity.
arXiv Detail & Related papers (2025-01-08T03:15:10Z)
Proceedings of the First International Workshop on Next-Generation Language Models for Knowledge Representation and Reasoning (NeLaMKRR 2024) [16.282850445579857]
Reasoning is an essential component of human intelligence as it plays a fundamental role in our ability to think critically. Recent leap forward in natural language processing, with the emergence of language models based on transformers, is hinting at the possibility that these models exhibit reasoning abilities. Despite ongoing discussions about what reasoning is in language models, it is still not easy to pin down to what extent these models are actually capable of reasoning.
arXiv Detail & Related papers (2024-10-07T02:31:47Z)
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning [56.03057119008865]
We show that scaling diffusion language models can effectively make them strong language learners. We build competent diffusion language models at scale by first acquiring knowledge from massive data. Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks.
arXiv Detail & Related papers (2023-08-23T16:01:12Z)
Arithmetic with Language Models: from Memorization to Computation [3.077668143048211]
This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing.
arXiv Detail & Related papers (2023-08-02T13:58:37Z)
Learning Non-linguistic Skills without Sacrificing Linguistic Proficiency [14.618731441943847]
Non-linguistic skill injection leads to catastrophic forgetting of core linguistic skills. Our model outperforms the state-of-the-art both on injected non-linguistic skills and on linguistic knowledge retention.
arXiv Detail & Related papers (2023-05-14T20:57:11Z)
On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex [48.588772371355816]
This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, codex. Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples.
arXiv Detail & Related papers (2023-01-30T13:21:00Z)
Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions. Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z)
Testing the Ability of Language Models to Interpret Figurative Language [69.59943454934799]
Figurative and metaphorical language are commonplace in discourse. It remains an open question to what extent modern language models can interpret nonliteral phrases. We introduce Fig-QA, a Winograd-style nonliteral language understanding task.
arXiv Detail & Related papers (2022-04-26T23:42:22Z)
Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing. Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z)
Language Models are not Models of Language [0.0]
Transfer learning has enabled large deep learning neural networks trained on the language modeling task to vastly improve performance. We argue that the term language model is misleading because deep learning models are not theoretical models of language.
arXiv Detail & Related papers (2021-12-13T22:39:46Z)
Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages. We infer this distribution from a sample of typologically diverse training languages. We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.