Learning Non-linguistic Skills without Sacrificing Linguistic
Proficiency
- URL: http://arxiv.org/abs/2305.08246v1
- Date: Sun, 14 May 2023 20:57:11 GMT
- Title: Learning Non-linguistic Skills without Sacrificing Linguistic
Proficiency
- Authors: Mandar Sharma, Nikhil Muralidhar, Naren Ramakrishnan
- Abstract summary: Non-linguistic skill injection leads to catastrophic forgetting of core linguistic skills.
Our model outperforms the state-of-the-art both on injected non-linguistic skills and on linguistic knowledge retention.
- Score: 14.618731441943847
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The field of Math-NLP has witnessed significant growth in recent years,
motivated by the desire to expand LLM performance to the learning of
non-linguistic notions (numerals, and subsequently, arithmetic reasoning).
However, non-linguistic skill injection typically comes at a cost for LLMs: it
leads to catastrophic forgetting of core linguistic skills, a consequence that
often remains unaddressed in the literature. As Math-NLP has been able to
create LLMs that can closely approximate the mathematical skills of a
grade-schooler or the arithmetic reasoning skills of a calculator, the
practicality of these models fail if they concomitantly shed their linguistic
capabilities. In this work, we take a closer look into the phenomena of
catastrophic forgetting as it pertains to LLMs and subsequently offer a novel
framework for non-linguistic skill injection for LLMs based on information
theoretic interventions and skill-specific losses that enable the learning of
strict arithmetic reasoning. Our model outperforms the state-of-the-art both on
injected non-linguistic skills and on linguistic knowledge retention, and does
so with a fraction of the non-linguistic training data (1/4) and zero
additional synthetic linguistic training data.
Related papers
- Causality for Large Language Models [37.10970529459278]
Large language models (LLMs) with billions or trillions of parameters are trained on vast datasets, achieving unprecedented success across a series of language tasks.
Recent research highlights that LLMs function as causal parrots, capable of reciting causal knowledge without truly understanding or applying it.
This survey aims to explore how causality can enhance LLMs at every stage of their lifecycle.
arXiv Detail & Related papers (2024-10-20T07:22:23Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Uncertainty Quantification for In-Context Learning of Large Language Models [52.891205009620364]
In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs)
We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties.
The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion.
arXiv Detail & Related papers (2024-02-15T18:46:24Z) - Is Knowledge All Large Language Models Needed for Causal Reasoning? [11.476877330365664]
This paper explores the causal reasoning of large language models (LLMs) to enhance their interpretability and reliability in advancing artificial intelligence.
We propose a novel causal attribution model that utilizes do-operators" for constructing counterfactual scenarios.
arXiv Detail & Related papers (2023-12-30T04:51:46Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Democratizing Reasoning Ability: Tailored Learning from Large Language
Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs.
We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm.
To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z) - Limits for Learning with Language Models [4.20859414811553]
We show that large language models (LLMs) are unable to learn concepts beyond the first level of the Borel Hierarchy.
LLMs will continue to operate without formal guarantees on tasks that require entailments and deep linguistic understanding.
arXiv Detail & Related papers (2023-06-21T12:11:31Z) - Large Language Models are In-Context Semantic Reasoners rather than
Symbolic Reasoners [75.85554779782048]
Large Language Models (LLMs) have excited the natural language and machine learning community over recent years.
Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear.
In this work, we hypothesize that the learned textitsemantics of language tokens do the most heavy lifting during the reasoning process.
arXiv Detail & Related papers (2023-05-24T07:33:34Z) - Rethinking with Retrieval: Faithful Large Language Model Inference [91.66406351103484]
We propose a novel post-processing approach, rethinking with retrieval (RR)
RR retrieves relevant external knowledge based on the reasoning steps obtained from the chain-of-thought prompting.
We evaluate the effectiveness of RR through extensive experiments with GPT-3 on three complex reasoning tasks.
arXiv Detail & Related papers (2022-12-31T22:35:34Z) - Overcoming Barriers to Skill Injection in Language Modeling: Case Study
in Arithmetic [14.618731441943847]
We develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess.
Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models.
arXiv Detail & Related papers (2022-11-03T18:53:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.