Effective LLM Knowledge Learning via Model Generalization
- URL: http://arxiv.org/abs/2503.03705v1
- Date: Wed, 05 Mar 2025 17:56:20 GMT
- Title: Effective LLM Knowledge Learning via Model Generalization
- Authors: Mingkang Zhu, Xi Chen, Zhongdao Wang, Bei Yu, Hengshuang Zhao, Jiaya Jia,
- Abstract summary: Large language models (LLMs) are trained on enormous documents that contain extensive world knowledge.<n>It is still not well-understood how knowledge is acquired via autoregressive pre-training.<n>In this paper, we focus on understanding and improving LLM knowledge learning.
- Score: 73.16975077770765
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are trained on enormous documents that contain extensive world knowledge. However, it is still not well-understood how knowledge is acquired via autoregressive pre-training. This lack of understanding greatly hinders effective knowledge learning, especially for continued pretraining on up-to-date information, as this evolving information often lacks diverse repetitions like foundational knowledge. In this paper, we focus on understanding and improving LLM knowledge learning. We found and verified that knowledge learning for LLMs can be deemed as an implicit supervised task hidden in the autoregressive pre-training objective. Our findings suggest that knowledge learning for LLMs would benefit from methods designed to improve generalization ability for supervised tasks. Based on our analysis, we propose the formatting-based data augmentation to grow in-distribution samples, which does not present the risk of altering the facts embedded in documents as text paraphrasing. We also introduce sharpness-aware minimization as an effective optimization algorithm to better improve generalization. Moreover, our analysis and method can be readily extended to instruction tuning. Extensive experiment results validate our findings and demonstrate our methods' effectiveness in both continued pre-training and instruction tuning. This paper offers new perspectives and insights to interpret and design effective strategies for LLM knowledge learning.
Related papers
- Teaching LLMs How to Learn with Contextual Fine-Tuning [9.26781270726841]
We study a novel generalization of instruction tuning, called contextual fine-tuning, to fine-tune LLMs.
We empirically demonstrate that this simple yet effective modification improves the ability of LLMs to be fine-tuned rapidly on new datasets.
arXiv Detail & Related papers (2025-03-12T03:45:53Z) - Refine Knowledge of Large Language Models via Adaptive Contrastive Learning [54.61213933999464]
A mainstream category of methods is to reduce hallucinations by optimizing the knowledge representation of Large Language Models.<n>We believe that the process of models refining knowledge can greatly benefit from the way humans learn.<n>In our work, by imitating the human learning process, we design an Adaptive Contrastive Learning strategy.
arXiv Detail & Related papers (2025-02-11T02:19:13Z) - KaLM: Knowledge-aligned Autoregressive Language Modeling via Dual-view Knowledge Graph Contrastive Learning [74.21524111840652]
This paper proposes textbfKaLM, a textitKnowledge-aligned Language Modeling approach.<n>It fine-tunes autoregressive large language models to align with KG knowledge via the joint objective of explicit knowledge alignment and implicit knowledge alignment.<n> Notably, our method achieves a significant performance boost in evaluations of knowledge-driven tasks.
arXiv Detail & Related papers (2024-12-06T11:08:24Z) - Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching [67.11497198002165]
Large language models (LLMs) often struggle to provide up-to-date information.<n>Existing approaches typically involve continued pre-training on new documents.<n>Motivated by the success of the Feynman Technique in efficient human learning, we introduce Self-Tuning.
arXiv Detail & Related papers (2024-06-10T14:42:20Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - Distilling Rule-based Knowledge into Large Language Models [90.7765003679106]
We are inspired that humans can learn the new tasks or knowledge in another way by learning from rules.
We propose rule distillation, which first uses the strong in-context abilities of LLMs to extract the knowledge from the textual rules.
Our experiments show that making LLMs learn from rules by our method is much more efficient than example-based learning in both the sample size and generalization ability.
arXiv Detail & Related papers (2023-11-15T11:42:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.