Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
- URL: http://arxiv.org/abs/2509.20162v2
- Date: Sun, 28 Sep 2025 02:06:48 GMT
- Title: Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
- Authors: Chaojun Nie, Jun Zhou, Guanxiang Wang, Shisong Wu, Zichen Wang,
- Abstract summary: We propose Reinforcement Learning from Augmented Generation (RLAG) to embed domain knowledge into large language models.<n>Our approach iteratively cycles between sampling generations and optimize the model through calculated rewards.<n> Experimental results across medical, legal, astronomy, and current events datasets demonstrate that our proposed method significantly outperforms baseline approaches.
- Score: 18.99847259801634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) often exhibit limited performance on domain-specific tasks due to the natural disproportionate representation of specialized information in their training data and the static nature of these datasets. Knowledge scarcity and temporal lag create knowledge gaps for domain applications. While post-training on domain datasets can embed knowledge into models, existing approaches have some limitations. Continual Pre-Training (CPT) treats all tokens in domain documents with equal importance, failing to prioritize critical knowledge points, while supervised fine-tuning (SFT) with question-answer pairs struggles to develop the coherent knowledge structures necessary for complex reasoning tasks. To address these challenges, we propose Reinforcement Learning from Augmented Generation (RLAG). Our approach iteratively cycles between sampling generations and optimizing the model through calculated rewards, effectively embedding critical and contextually coherent domain knowledge. We select generated outputs with the highest log probabilities as the sampling result, then compute three tailored reward metrics to guide the optimization process. To comprehensively evaluate domain expertise, we assess answer accuracy and the rationality of explanations generated for correctly answered questions. Experimental results across medical, legal, astronomy, and current events datasets demonstrate that our proposed method significantly outperforms baseline approaches. Our code and data are open sourced at https://github.com/ChaojunNie/RLAG.
Related papers
- Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval [60.25608870901428]
Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs)<n>We propose the task of fact-checking without retrieval, focusing on the verification of arbitrary natural language claims, independent of their source robustness.
arXiv Detail & Related papers (2026-03-05T18:42:51Z) - Multi-hop Reasoning via Early Knowledge Alignment [68.28168992785896]
Early Knowledge Alignment (EKA) aims to align Large Language Models with contextually relevant retrieved knowledge.<n>EKA significantly improves retrieval precision, reduces cascading errors, and enhances both performance and efficiency.<n>EKA proves effective as a versatile, training-free inference strategy that scales seamlessly to large models.
arXiv Detail & Related papers (2025-12-23T08:14:44Z) - SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation [3.5939555573102857]
Supervised Fine-Tuning (SFT) is essential for training large language models (LLMs)<n>We propose SearchInstruct, a method explicitly designed to construct high quality instruction datasets for SFT.<n>Our approach begins with a limited set of domain specific, human generated questions, which are systematically expanded using a large language model.
arXiv Detail & Related papers (2025-09-12T21:50:39Z) - Does This Look Familiar to You? Knowledge Analysis via Model Internal Representations [0.0]
There is no clearly established methodology for effective training data selection.<n>Model Internal Representations (KAMIR) is a novel approach that overcomes these limitations.<n>It can be applied to a wide range of tasks such as machine reading comprehension and summarization.
arXiv Detail & Related papers (2025-09-09T01:08:15Z) - Towards Efficient and Effective Alignment of Large Language Models [7.853945494882636]
Large language models (LLMs) exhibit remarkable capabilities across diverse tasks, yet aligning them efficiently and effectively with human expectations remains a critical challenge.<n>This thesis advances LLM alignment by introducing novel methodologies in data collection, training, and evaluation.
arXiv Detail & Related papers (2025-06-11T02:08:52Z) - Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging [11.377241012645994]
InForage is a reinforcement learning framework that formalizes retrieval-augmented reasoning as a dynamic information-seeking process.<n>We construct a human-guided dataset capturing iterative search and reasoning trajectories for complex, real-world web tasks.<n>These results highlight InForage's effectiveness in building robust, adaptive, and efficient reasoning agents.
arXiv Detail & Related papers (2025-05-14T12:13:38Z) - Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs [11.724887822269528]
Large language models (LLMs) have achieved unprecedented performance by leveraging vast pretraining corpora.<n>Their performance remains suboptimal in knowledge-intensive domains such as medicine and scientific research.<n>We propose a novel Structural Entropy-guided Knowledge Navigator (SENATOR) framework that addresses the intrinsic knowledge deficiencies of LLMs.
arXiv Detail & Related papers (2025-05-12T02:21:36Z) - AdvKT: An Adversarial Multi-Step Training Framework for Knowledge Tracing [64.79967583649407]
Knowledge Tracing (KT) monitors students' knowledge states and simulates their responses to question sequences.<n>Existing KT models typically follow a single-step training paradigm, which leads to significant error accumulation.<n>We propose a novel Adversarial Multi-Step Training Framework for Knowledge Tracing (AdvKT) which focuses on the multi-step KT task.
arXiv Detail & Related papers (2025-04-07T03:31:57Z) - Learning Latent Hardening (LLH): Enhancing Deep Learning with Domain Knowledge for Material Inverse Problems [0.0]
In this study, the incorporation of domain-specific knowledge of the mechanical behavior of material microstructures is investigated.<n>To overcome data limitations, a two-step framework, Learning Latent Hardening (LLH) is proposed.<n>The results from the models with domain-specific information consistently achieved higher $R2$ values compared to models without prior knowledge.
arXiv Detail & Related papers (2025-01-17T03:09:25Z) - KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [73.34893326181046]
We present KBAlign, a self-supervised framework that enhances RAG systems through efficient model adaptation.<n>Our key insight is to leverage the model's intrinsic capabilities for knowledge alignment through two innovative mechanisms.<n> Experiments demonstrate that KBAlign can achieve 90% of the performance gain obtained through GPT-4-supervised adaptation.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch [54.12139707822201]
We propose ScaleQuest, a novel, scalable, and cost-effective data synthesis method.<n>By generating diverse questions from scratch, we produce a dataset of 1 million problem-solution pairs.<n>Our experiments demonstrate that models trained on our data outperform existing open-source datasets.
arXiv Detail & Related papers (2024-10-24T12:42:04Z) - GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input.<n>GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.