Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks
- URL: http://arxiv.org/abs/2409.13203v1
- Date: Fri, 20 Sep 2024 04:17:13 GMT
- Title: Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks
- Authors: Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Kang Liu, Jun Zhao,
- Abstract summary: We propose a novel knowledge distillation method for learning the complex reasoning abilities of Large Language Models (LLMs)
NesyCD distills the general capabilities and specialized knowledge in LLMs using different manners.
Our experiments show that NesyCD significantly boosts SLMs' complex reasoning performance on in-domain (BBH, GSM8K) and out-of-domain (AGIEval, ARC) datasets.
- Score: 30.572064185770298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose $\textbf{Ne}$ural-$\textbf{Sy}$mbolic $\textbf{C}$ollaborative $\textbf{D}$istillation ($\textbf{NesyCD}$), a novel knowledge distillation method for learning the complex reasoning abilities of Large Language Models (LLMs, e.g., \textgreater 13B). We argue that complex reasoning tasks are difficult for Small Language Models (SLMs, e.g., $\leq$ 7B), as these tasks demand not only general cognitive abilities but also specialized knowledge, which is often sparse and difficult for these neural-based SLMs to effectively capture. Therefore, NesyCD distills the general capabilities and specialized knowledge in LLMs using different manners. On the one hand, we distill only general abilities from teacher LLMs into the student SLMs of parameterized neural networks. On the other hand, for the specialized abilities and uncommon knowledge of a complex reasoning task, we employ a symbolic knowledge distillation approach to obtain and store the specialized knowledge within a symbolic knowledge base (KB). By decoupling general and specialized capabilities, the proposed NesyCD can achieve superior performance cost-effectively, utilizing smaller models and blending parameterized neural networks with symbolic KB. Moreover, the specialized KB generalizes well and is comprehended and manipulated by humans. Our experiments show that NesyCD significantly boosts SLMs' complex reasoning performance on in-domain (BBH, GSM8K) and out-of-domain (AGIEval, ARC) datasets. Notably, our approach enabled the LLaMA3-8B and Qwen2-7B to surpass GPT-3.5-turbo in performance and come close to matching LLaMA3-70B, despite the latter having nine times more parameters. Our code will be available at https://github.com/Xnhyacinth/NesyCD.
Related papers
- GIVE: Structured Reasoning with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning framework that integrates the parametric and non-parametric memories.
Our method facilitates a more logical and step-wise reasoning approach akin to experts' problem-solving, rather than gold answer retrieval.
arXiv Detail & Related papers (2024-10-11T03:05:06Z) - $\textit{SKIntern}$: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models [27.07695214182334]
Small Language Models (SLMs) are attracting attention due to the high computational demands and privacy concerns.
We introduce $textitSKIntern$, an innovative approach that empowers SLMs to internalize symbolic knowledge.
arXiv Detail & Related papers (2024-09-20T03:23:20Z) - CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge [44.59258397967782]
Large language models (LLMs) have demonstrated impressive capabilities across various natural language processing tasks.
We present a systematic evaluation of state-of-the-art LLMs' complex logical reasoning abilities.
We find that LLMs excel at reasoning over general world knowledge but face significant challenges with specialized domain-specific knowledge.
arXiv Detail & Related papers (2024-07-30T05:40:32Z) - Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs [55.317267269115845]
Chain-of-Knowledge (CoK) is a comprehensive framework for knowledge reasoning.
CoK includes methodologies for both dataset construction and model learning.
We conduct extensive experiments with KnowReason.
arXiv Detail & Related papers (2024-06-30T10:49:32Z) - TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale [66.01943465390548]
We introduce TriSum, a framework for distilling large language models' text summarization abilities into a compact, local model.
Our method enhances local model performance on various benchmarks.
It also improves interpretability by providing insights into the summarization rationale.
arXiv Detail & Related papers (2024-03-15T14:36:38Z) - A Knowledge-Injected Curriculum Pretraining Framework for Question Answering [70.13026036388794]
We propose a general Knowledge-Injected Curriculum Pretraining framework (KICP) to achieve comprehensive KG learning and exploitation for Knowledge-based question answering tasks.
The KI module first injects knowledge into the LM by generating KG-centered pretraining corpus, and generalizes the process into three key steps.
The KA module learns knowledge from the generated corpus with LM equipped with an adapter as well as keeps its original natural language understanding ability.
The CR module follows human reasoning patterns to construct three corpora with increasing difficulties of reasoning, and further trains the LM from easy to hard in a curriculum manner.
arXiv Detail & Related papers (2024-03-11T03:42:03Z) - Prompt-Time Symbolic Knowledge Capture with Large Language Models [0.0]
Augmenting large language models (LLMs) with user-specific knowledge is crucial for real-world applications, such as personal AI assistants.
This paper investigates utilizing the existing LLM capabilities to enable prompt-driven knowledge capture.
arXiv Detail & Related papers (2024-02-01T08:15:28Z) - Knowledge-Augmented Reasoning Distillation for Small Language Models in
Knowledge-Intensive Tasks [90.11273439036455]
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks.
We propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales from LLMs with augmented knowledge retrieved from an external knowledge base.
We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets.
arXiv Detail & Related papers (2023-05-28T13:00:00Z) - Great Truths are Always Simple: A Rather Simple Knowledge Encoder for
Enhancing the Commonsense Reasoning Capacity of Pre-Trained Models [89.98762327725112]
Commonsense reasoning in natural language is a desired ability of artificial intelligent systems.
For solving complex commonsense reasoning tasks, a typical solution is to enhance pre-trained language models(PTMs) with a knowledge-aware graph neural network(GNN) encoder.
Despite the effectiveness, these approaches are built on heavy architectures, and can't clearly explain how external knowledge resources improve the reasoning capacity of PTMs.
arXiv Detail & Related papers (2022-05-04T01:27:36Z) - High-level Features for Resource Economy and Fast Learning in Skill
Transfer [0.8602553195689513]
Deep networks are proven to be effective due to their ability to form increasingly complex abstractions.
Previous work either enforced formation of abstractions creating a designer bias, or used a large number of neural units.
We propose to exploit neural response dynamics to form compact representations to use in skill transfer.
arXiv Detail & Related papers (2021-06-18T21:05:21Z) - DISCOS: Bridging the Gap between Discourse Knowledge and Commonsense
Knowledge [42.08569149041291]
We propose an alternative commonsense knowledge acquisition framework DISCOS.
DISCOS populates expensive commonsense knowledge to more affordable linguistic knowledge resources.
We can acquire 3.4M ATOMIC-like inferential commonsense knowledge by populating ATOMIC on the core part of ASER.
arXiv Detail & Related papers (2021-01-01T03:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.