Related papers: Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks

Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks

URL: http://arxiv.org/abs/2409.13203v1
Date: Fri, 20 Sep 2024 04:17:13 GMT
Title: Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks
Authors: Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Kang Liu, Jun Zhao,
Abstract summary: We propose a novel knowledge distillation method for learning the complex reasoning abilities of Large Language Models (LLMs) NesyCD distills the general capabilities and specialized knowledge in LLMs using different manners. Our experiments show that NesyCD significantly boosts SLMs' complex reasoning performance on in-domain (BBH, GSM8K) and out-of-domain (AGIEval, ARC) datasets.
Score: 30.572064185770298
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose $\textbf{Ne}$ural-$\textbf{Sy}$mbolic $\textbf{C}$ollaborative $\textbf{D}$istillation ($\textbf{NesyCD}$), a novel knowledge distillation method for learning the complex reasoning abilities of Large Language Models (LLMs, e.g., \textgreater 13B). We argue that complex reasoning tasks are difficult for Small Language Models (SLMs, e.g., $\leq$ 7B), as these tasks demand not only general cognitive abilities but also specialized knowledge, which is often sparse and difficult for these neural-based SLMs to effectively capture. Therefore, NesyCD distills the general capabilities and specialized knowledge in LLMs using different manners. On the one hand, we distill only general abilities from teacher LLMs into the student SLMs of parameterized neural networks. On the other hand, for the specialized abilities and uncommon knowledge of a complex reasoning task, we employ a symbolic knowledge distillation approach to obtain and store the specialized knowledge within a symbolic knowledge base (KB). By decoupling general and specialized capabilities, the proposed NesyCD can achieve superior performance cost-effectively, utilizing smaller models and blending parameterized neural networks with symbolic KB. Moreover, the specialized KB generalizes well and is comprehended and manipulated by humans. Our experiments show that NesyCD significantly boosts SLMs' complex reasoning performance on in-domain (BBH, GSM8K) and out-of-domain (AGIEval, ARC) datasets. Notably, our approach enabled the LLaMA3-8B and Qwen2-7B to surpass GPT-3.5-turbo in performance and come close to matching LLaMA3-70B, despite the latter having nine times more parameters. Our code will be available at https://github.com/Xnhyacinth/NesyCD.

Related papers

Self-GIVE: Associative Thinking from Limited Structured Knowledge for Enhanced Large Language Model Reasoning [103.20561312320294]
Graph Inspired Veracity Extrapolation (GIVE) addresses this by using a knowledge graph (KG) to extrapolate structured knowledge.<n>We propose Self-GIVE, a retrieve-RL framework that enhances Large Language Models with automatic associative thinking.<n>Self-GIVE enhances the scalable integration of structured retrieval and reasoning with associative thinking.
arXiv Detail & Related papers (2025-05-21T03:30:55Z)
General Reasoning Requires Learning to Reason from the Get-go [19.90997698310839]
Large Language Models (LLMs) have demonstrated impressive real-world utility. But their ability to reason adaptively and robustly remains fragile. We propose disangling knowledge and reasoning through three key directions.
arXiv Detail & Related papers (2025-02-26T18:51:12Z)
GIVE: Structured Reasoning with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning framework that integrates the parametric and non-parametric memories. Our method facilitates a more logical and step-wise reasoning approach akin to experts' problem-solving, rather than gold answer retrieval.
arXiv Detail & Related papers (2024-10-11T03:05:06Z)
$\textit{SKIntern}$: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models [27.07695214182334]
Small Language Models (SLMs) are attracting attention due to the high computational demands and privacy concerns. We introduce $textitSKIntern$, an innovative approach that empowers SLMs to internalize symbolic knowledge.
arXiv Detail & Related papers (2024-09-20T03:23:20Z)
CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge [44.59258397967782]
Large language models (LLMs) have demonstrated impressive capabilities across various natural language processing tasks. We present a systematic evaluation of state-of-the-art LLMs' complex logical reasoning abilities. We find that LLMs excel at reasoning over general world knowledge but face significant challenges with specialized domain-specific knowledge.
arXiv Detail & Related papers (2024-07-30T05:40:32Z)
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs [55.317267269115845]
Chain-of-Knowledge (CoK) is a comprehensive framework for knowledge reasoning. CoK includes methodologies for both dataset construction and model learning. We conduct extensive experiments with KnowReason.
arXiv Detail & Related papers (2024-06-30T10:49:32Z)
Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts [50.06633829833144]
Large Language Models (LLMs) are effective in performing various NLP tasks, but struggle to handle tasks that require extensive, real-world knowledge. We propose a benchmark that requires knowledge of long-tail facts for answering the involved questions. Our experiments show that LLMs alone struggle with answering these questions, especially when the long-tail level is high or rich knowledge is required.
arXiv Detail & Related papers (2024-05-10T15:10:20Z)
TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale [66.01943465390548]
We introduce TriSum, a framework for distilling large language models' text summarization abilities into a compact, local model. Our method enhances local model performance on various benchmarks. It also improves interpretability by providing insights into the summarization rationale.
arXiv Detail & Related papers (2024-03-15T14:36:38Z)
A Knowledge-Injected Curriculum Pretraining Framework for Question Answering [70.13026036388794]
We propose a general Knowledge-Injected Curriculum Pretraining framework (KICP) to achieve comprehensive KG learning and exploitation for Knowledge-based question answering tasks. The KI module first injects knowledge into the LM by generating KG-centered pretraining corpus, and generalizes the process into three key steps. The KA module learns knowledge from the generated corpus with LM equipped with an adapter as well as keeps its original natural language understanding ability. The CR module follows human reasoning patterns to construct three corpora with increasing difficulties of reasoning, and further trains the LM from easy to hard in a curriculum manner.
arXiv Detail & Related papers (2024-03-11T03:42:03Z)
Prompt-Time Symbolic Knowledge Capture with Large Language Models [0.0]
Augmenting large language models (LLMs) with user-specific knowledge is crucial for real-world applications, such as personal AI assistants. This paper investigates utilizing the existing LLM capabilities to enable prompt-driven knowledge capture.
arXiv Detail & Related papers (2024-02-01T08:15:28Z)
Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z)
Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks [90.11273439036455]
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks. We propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales from LLMs with augmented knowledge retrieved from an external knowledge base. We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets.
arXiv Detail & Related papers (2023-05-28T13:00:00Z)
Great Truths are Always Simple: A Rather Simple Knowledge Encoder for Enhancing the Commonsense Reasoning Capacity of Pre-Trained Models [89.98762327725112]
Commonsense reasoning in natural language is a desired ability of artificial intelligent systems. For solving complex commonsense reasoning tasks, a typical solution is to enhance pre-trained language models(PTMs) with a knowledge-aware graph neural network(GNN) encoder. Despite the effectiveness, these approaches are built on heavy architectures, and can't clearly explain how external knowledge resources improve the reasoning capacity of PTMs.
arXiv Detail & Related papers (2022-05-04T01:27:36Z)
High-level Features for Resource Economy and Fast Learning in Skill Transfer [0.8602553195689513]
Deep networks are proven to be effective due to their ability to form increasingly complex abstractions. Previous work either enforced formation of abstractions creating a designer bias, or used a large number of neural units. We propose to exploit neural response dynamics to form compact representations to use in skill transfer.
arXiv Detail & Related papers (2021-06-18T21:05:21Z)
DISCOS: Bridging the Gap between Discourse Knowledge and Commonsense Knowledge [42.08569149041291]
We propose an alternative commonsense knowledge acquisition framework DISCOS. DISCOS populates expensive commonsense knowledge to more affordable linguistic knowledge resources. We can acquire 3.4M ATOMIC-like inferential commonsense knowledge by populating ATOMIC on the core part of ASER.
arXiv Detail & Related papers (2021-01-01T03:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.