Enhancing LLM Knowledge Learning through Generalization
- URL: http://arxiv.org/abs/2503.03705v2
- Date: Sun, 22 Jun 2025 19:25:04 GMT
- Title: Enhancing LLM Knowledge Learning through Generalization
- Authors: Mingkang Zhu, Xi Chen, Zhongdao Wang, Bei Yu, Hengshuang Zhao, Jiaya Jia,
- Abstract summary: We show that an LLM's ability to continually predict the same factual knowledge tokens given diverse paraphrased contexts is positively correlated with its capacity to extract that knowledge via question-answering.<n>We propose two strategies to enhance LLMs' ability to predict the same knowledge tokens given varied contexts, thereby enhancing knowledge acquisition.
- Score: 73.16975077770765
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As Large language models (LLMs) are increasingly deployed in diverse applications, faithfully integrating evolving factual knowledge into these models remains a critical challenge. Continued pre-training on paraphrased data has shown empirical promise for enhancing knowledge acquisition. However, this approach is often costly and unreliable, as it relies on external models or manual effort for rewriting, and may inadvertently alter the factual content. In this work, we hypothesize and empirically show that an LLM's ability to continually predict the same factual knowledge tokens given diverse paraphrased contexts is positively correlated with its capacity to extract that knowledge via question-answering. Based on this view and aiming to improve generalization to diverse paraphrased contexts, we introduce two strategies to enhance LLMs' ability to predict the same knowledge tokens given varied contexts, thereby enhancing knowledge acquisition. First, we propose formatting-based data augmentation, which diversifies documents conveying the same knowledge by altering document formats rather than their content, thereby preserving factual integrity. Second, we adopt sharpness-aware minimization as the optimizer to better improve generalization. Extensive experiments demonstrate our methods' effectiveness in both continued pre-training and instruction tuning, and further gains can be achieved by combining with paraphrased data.
Related papers
- LLM Inference Enhanced by External Knowledge: A Survey [16.319049759753106]
This study explores strategies for using external knowledge to enhance large language models (LLMs)<n>Our comparative analysis highlights the trade-offs among interpretability, scalability, and performance.
arXiv Detail & Related papers (2025-05-30T09:08:51Z) - Teaching LLMs How to Learn with Contextual Fine-Tuning [9.26781270726841]
We study a novel generalization of instruction tuning, called contextual fine-tuning, to fine-tune LLMs.
We empirically demonstrate that this simple yet effective modification improves the ability of LLMs to be fine-tuned rapidly on new datasets.
arXiv Detail & Related papers (2025-03-12T03:45:53Z) - Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training [66.48331530995786]
We propose syMmetry-ENhanceD (MEND) Data Augmentation, a data-centric approach that improves the model's ability to extract useful information from context.<n>Unlike existing methods that emphasize reasoning chain augmentation, our approach improves model robustness at the knowledge extraction stage.<n>Experiments on both logical and arithmetic reasoning tasks show that MEND enhances reasoning performance across diverse query variations.
arXiv Detail & Related papers (2025-02-25T03:03:35Z) - Refine Knowledge of Large Language Models via Adaptive Contrastive Learning [54.61213933999464]
A mainstream category of methods is to reduce hallucinations by optimizing the knowledge representation of Large Language Models.<n>We believe that the process of models refining knowledge can greatly benefit from the way humans learn.<n>In our work, by imitating the human learning process, we design an Adaptive Contrastive Learning strategy.
arXiv Detail & Related papers (2025-02-11T02:19:13Z) - ConKE: Conceptualization-Augmented Knowledge Editing in Large Language Models for Commonsense Reasoning [47.98788315789392]
ConceptEdit is a framework that integrates conceptualization and instantiation into the Knowledge Editing pipeline.<n>We show that ConceptEdit successfully generates commonsense knowledge with improved plausibility compared to other baselines.
arXiv Detail & Related papers (2024-12-16T03:34:40Z) - KaLM: Knowledge-aligned Autoregressive Language Modeling via Dual-view Knowledge Graph Contrastive Learning [74.21524111840652]
This paper proposes textbfKaLM, a textitKnowledge-aligned Language Modeling approach.<n>It fine-tunes autoregressive large language models to align with KG knowledge via the joint objective of explicit knowledge alignment and implicit knowledge alignment.<n> Notably, our method achieves a significant performance boost in evaluations of knowledge-driven tasks.
arXiv Detail & Related papers (2024-12-06T11:08:24Z) - Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching [67.11497198002165]
Large language models (LLMs) often struggle to provide up-to-date information.<n>Existing approaches typically involve continued pre-training on new documents.<n>Motivated by the success of the Feynman Technique in efficient human learning, we introduce Self-Tuning.
arXiv Detail & Related papers (2024-06-10T14:42:20Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - Distilling Rule-based Knowledge into Large Language Models [90.7765003679106]
We are inspired that humans can learn the new tasks or knowledge in another way by learning from rules.
We propose rule distillation, which first uses the strong in-context abilities of LLMs to extract the knowledge from the textual rules.
Our experiments show that making LLMs learn from rules by our method is much more efficient than example-based learning in both the sample size and generalization ability.
arXiv Detail & Related papers (2023-11-15T11:42:41Z) - Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus.
We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z) - LM-CORE: Language Models with Contextually Relevant External Knowledge [13.451001884972033]
We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements.
We present LM-CORE -- a general framework to achieve this -- that allows textitdecoupling of the language model training from the external knowledge source.
Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks.
arXiv Detail & Related papers (2022-08-12T18:59:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.