Related papers: Enabling Large Language Models to Learn from Rules

Enabling Large Language Models to Learn from Rules

URL: http://arxiv.org/abs/2311.08883v2
Date: Fri, 16 Feb 2024 14:07:24 GMT
Title: Enabling Large Language Models to Learn from Rules
Authors: Wenkai Yang, Yankai Lin, Jie Zhou, Jirong Wen
Abstract summary: We are inspired that humans can learn the new tasks or knowledge in another way by learning from rules. We propose rule distillation, which first uses the strong in-context abilities of LLMs to extract the knowledge from the textual rules. Our experiments show that making LLMs learn from rules by our method is much more efficient than example-based learning in both the sample size and generalization ability.
Score: 99.16680531261987
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have shown incredible performance in completing various real-world tasks. The current knowledge learning paradigm of LLMs is mainly based on learning from examples, in which LLMs learn the internal rule implicitly from a certain number of supervised examples. However, this learning paradigm may not well learn those complicated rules, especially when the training examples are limited. We are inspired that humans can learn the new tasks or knowledge in another way by learning from rules. That is, humans can learn new tasks or grasps new knowledge quickly and generalize well given only a detailed rule and a few optional examples. Therefore, in this paper, we aim to explore the feasibility of this new learning paradigm, which targets on encoding rule-based knowledge into LLMs. We further propose rule distillation, which first uses the strong in-context abilities of LLMs to extract the knowledge from the textual rules, and then explicitly encode the knowledge into the parameters of LLMs by learning from the above in-context signals produced inside the model. Our experiments show that making LLMs learn from rules by our method is much more efficient than example-based learning in both the sample size and generalization ability. Warning: This paper may contain examples with offensive content.

Related papers

Navigating Pitfalls: Evaluating LLMs in Machine Learning Programming Education [2.9248916859490173]
This study examines the use of Large Language Models in supporting learning in machine learning education.<n>It focuses on the ability of LLMs to identify common errors of practice in machine learning code, and their ability to provide feedback that can guide learning.
arXiv Detail & Related papers (2025-05-23T08:39:58Z)
Effective LLM Knowledge Learning via Model Generalization [73.16975077770765]
Large language models (LLMs) are trained on enormous documents that contain extensive world knowledge. It is still not well-understood how knowledge is acquired via autoregressive pre-training. In this paper, we focus on understanding and improving LLM knowledge learning.
arXiv Detail & Related papers (2025-03-05T17:56:20Z)
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning [54.61213933999464]
A mainstream category of methods is to reduce hallucinations by optimizing the knowledge representation of Large Language Models. We believe that the process of models refining knowledge can greatly benefit from the way humans learn. In our work, by imitating the human learning process, we design an Adaptive Contrastive Learning strategy.
arXiv Detail & Related papers (2025-02-11T02:19:13Z)
Aggregate and conquer: detecting and steering LLM concepts by combining nonlinear predictors over multiple layers [16.303681959333883]
We give a general method for detecting semantic concepts in the internal activations of Large Language Models. We show that our methodology can be easily adapted to steer LLMs toward desirable outputs. We highlight the generality of our approach by steering LLMs towards new concepts that, to the best of our knowledge, have not been previously considered.
arXiv Detail & Related papers (2025-02-06T01:41:48Z)
KaLM: Knowledge-aligned Autoregressive Language Modeling via Dual-view Knowledge Graph Contrastive Learning [74.21524111840652]
This paper proposes textbfKaLM, a textitKnowledge-aligned Language Modeling approach. It fine-tunes autoregressive large language models to align with KG knowledge via the joint objective of explicit knowledge alignment and implicit knowledge alignment. Notably, our method achieves a significant performance boost in evaluations of knowledge-driven tasks.
arXiv Detail & Related papers (2024-12-06T11:08:24Z)
zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning [6.976968804436321]
Large language models (LLMs) have the capability of zero-shot learning, which does not require training or fine-tuning. We propose zsLLMCode, a novel approach that generates functional code embeddings using LLMs.
arXiv Detail & Related papers (2024-09-23T01:03:15Z)
When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp. Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment. Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z)
Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation [23.736611338497244]
TinyLLM is a new knowledge distillation paradigm to learn a small student LLM from multiple large teacher LLMs. We introduce an in-context example generator and a teacher-forcing Chain-of-Thought strategy to ensure that the rationales are accurate and grounded in contextually appropriate scenarios. Results show that TinyLLM can outperform large teacher LLMs significantly, despite a considerably smaller model size.
arXiv Detail & Related papers (2024-02-07T06:48:24Z)
See the Unseen: Better Context-Consistent Knowledge-Editing by Noises [73.54237379082795]
Knowledge-editing updates knowledge of large language models (LLMs) Existing works ignore this property and the editing lacks generalization. We empirically find that the effects of different contexts upon LLMs in recalling the same knowledge follow a Gaussian-like distribution.
arXiv Detail & Related papers (2024-01-15T09:09:14Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
Large Language Models can Learn Rules [106.40747309894236]
We present Hypotheses-to-Theories (HtT), a framework that learns a rule library for reasoning with large language models (LLMs) Experiments on relational reasoning, numerical reasoning and concept learning problems show that HtT improves existing prompting methods. The learned rules are also transferable to different models and to different forms of the same problem.
arXiv Detail & Related papers (2023-10-10T23:07:01Z)
Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs [19.0797968186656]
Large language models (LLMs) are versatile and can solve different tasks due to their emergent ability and generalizability. In some previous works, additional modules like graph neural networks (GNNs) are trained on retrieved knowledge from external knowledge bases.
arXiv Detail & Related papers (2023-09-06T15:55:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.