Knowledge-Augmented Reasoning Distillation for Small Language Models in
Knowledge-Intensive Tasks
- URL: http://arxiv.org/abs/2305.18395v2
- Date: Mon, 30 Oct 2023 08:20:14 GMT
- Title: Knowledge-Augmented Reasoning Distillation for Small Language Models in
Knowledge-Intensive Tasks
- Authors: Minki Kang, Seanie Lee, Jinheon Baek, Kenji Kawaguchi, Sung Ju Hwang
- Abstract summary: Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks.
We propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales from LLMs with augmented knowledge retrieved from an external knowledge base.
We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets.
- Score: 90.11273439036455
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have shown promising performance in
knowledge-intensive reasoning tasks that require a compound understanding of
knowledge. However, deployment of the LLMs in real-world applications can be
challenging due to their high computational requirements and concerns on data
privacy. Previous studies have focused on building task-specific small Language
Models (LMs) by fine-tuning them with labeled data or distilling LLMs. However,
these approaches are ill-suited for knowledge-intensive reasoning tasks due to
the limited capacity of small LMs in memorizing the knowledge required.
Motivated by our theoretical analysis on memorization, we propose
Knowledge-Augmented Reasoning Distillation (KARD), a novel method that
fine-tunes small LMs to generate rationales obtained from LLMs with augmented
knowledge retrieved from an external knowledge base. Moreover, we further
propose a neural reranker to obtain documents relevant to rationale generation.
We empirically show that KARD significantly improves the performance of small
T5 and GPT models on the challenging knowledge-intensive reasoning datasets,
namely MedQA-USMLE, StrategyQA, and OpenbookQA. Notably, our method makes the
250M T5 models achieve superior performance against the fine-tuned 3B models,
having 12 times larger parameters, on both MedQA-USMLE and StrategyQA
benchmarks.
Related papers
- A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models [71.25225058845324]
Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation.
Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge.
RA-LLMs have emerged to harness external and authoritative knowledge bases, rather than relying on the model's internal knowledge.
arXiv Detail & Related papers (2024-05-10T02:48:45Z) - Evolving Knowledge Distillation with Large Language Models and Active
Learning [46.85430680828938]
Large language models (LLMs) have demonstrated remarkable capabilities across various NLP tasks.
Previous research has attempted to distill the knowledge of LLMs into smaller models by generating annotated data.
We propose EvoKD: Evolving Knowledge Distillation, which leverages the concept of active learning to interactively enhance the process of data generation using large language models.
arXiv Detail & Related papers (2024-03-11T03:55:24Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Knowledge Editing for Large Language Models: A Survey [51.01368551235289]
One major drawback of large language models (LLMs) is their substantial computational cost for pre-training.
Knowledge-based Model Editing (KME) has attracted increasing attention, which aims to precisely modify the LLMs to incorporate specific knowledge.
arXiv Detail & Related papers (2023-10-24T22:18:13Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs
for Fact-aware Language Modeling [34.59678835272862]
ChatGPT, a representative large language model (LLM), has gained considerable attention due to its powerful emergent abilities.
This paper proposes to enhance LLMs with knowledge graph-enhanced large language models (KGLLMs)
KGLLM provides a solution to enhance LLMs' factual reasoning ability, opening up new avenues for LLM research.
arXiv Detail & Related papers (2023-06-20T12:21:06Z) - When Not to Trust Language Models: Investigating Effectiveness of
Parametric and Non-Parametric Memories [58.3421305091187]
This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge.
We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the long tail.
We devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary.
arXiv Detail & Related papers (2022-12-20T18:30:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.