Related papers: Educating LLMs like Human Students: Structure-aware Injection of Domain Knowledge

Educating LLMs like Human Students: Structure-aware Injection of Domain Knowledge

URL: http://arxiv.org/abs/2407.16724v1
Date: Tue, 23 Jul 2024 12:38:48 GMT
Title: Educating LLMs like Human Students: Structure-aware Injection of Domain Knowledge
Authors: Kai Liu, Ze Chen, Zhihang Fu, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, Jieping Ye,
Abstract summary: StructTuning is a methodology to transform Large Language Models (LLMs) into domain specialists. It significantly minimizes the training corpus requirement to a mere 0.3%. Our method matches 50% of the improvement displayed by the state-of-the-art MMedLM2 on MMedBench.
Score: 37.089378357827826
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a pioneering methodology, termed StructTuning, to efficiently transform foundation Large Language Models (LLMs) into domain specialists. It significantly minimizes the training corpus requirement to a mere 0.3% while achieving an impressive 50% of traditional knowledge injection performance. Our method is inspired by the educational processes for human students, particularly how structured domain knowledge from textbooks is absorbed and then applied to tackle real-world challenges through specific exercises. Based on this, we propose a novel two-stage knowledge injection strategy: Structure-aware Continual Pre-Training (SCPT) and Structure-aware Supervised Fine-Tuning (SSFT). In the SCPT phase, we organize the training data into an auto-generated taxonomy of domain knowledge, enabling LLMs to effectively memorize textual segments linked to specific expertise within the taxonomy's architecture. Subsequently, in the SSFT phase, we explicitly prompt models to reveal the underlying knowledge structure in their outputs, leveraging this structured domain insight to address practical problems adeptly. Our ultimate method has undergone extensive evaluations across model architectures and scales, using closed-book question-answering tasks on LongBench and MMedBench datasets. Remarkably, our method matches 50% of the improvement displayed by the state-of-the-art MMedLM2 on MMedBench, but with only 0.3% quantity of the training corpus. This breakthrough showcases the potential to scale up our StructTuning for stronger domain-specific LLMs. Code will be made public soon.

Related papers

Localizing Knowledge in Diffusion Transformers [44.27817967554535]
We propose a model- and knowledge-agnostic method to localize where specific types of knowledge are encoded within the Diffusion Transformer blocks.<n>We show that the identified blocks are both interpretable and causally linked to the expression of knowledge in generated outputs.<n>Our findings offer new insights into the internal structure of DiTs and introduce a practical pathway for more interpretable, efficient, and controllable model editing.
arXiv Detail & Related papers (2025-05-24T19:02:20Z)
LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
FANformer: Improving Large Language Models Through Effective Periodicity Modeling [30.84203256282429]
We introduce FANformer, which integrates Fourier Analysis Network (FAN) into attention mechanism to achieve efficient periodicity modeling. Experiments show that FANformer consistently outperforms Transformer when scaling up model size and training tokens. To further validate the effectiveness of FANformer, we pretrain a FANformer-1B on 1 trillion tokens.
arXiv Detail & Related papers (2025-02-28T18:52:24Z)
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [94.31508613367296]
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs) We propose StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure. Experiments show that StructRAG achieves state-of-the-art performance, particularly excelling in challenging scenarios.
arXiv Detail & Related papers (2024-10-11T13:52:44Z)
HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling [39.14392943549792]
We propose a novel approach called Hierarchical Prompt Tuning (HPT), enabling simultaneous modeling of both structured and conventional linguistic knowledge. We introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning. By incorporating high-level and global-level prompts modeling overall semantics, the proposed hierarchical structure forges cross-level interlinks and empowers the model to handle more complex and long-term relationships.
arXiv Detail & Related papers (2024-08-27T06:50:28Z)
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers [16.253898272659242]
State-of-the-art results in large language models (LLMs) often rely on scale, which becomes computationally expensive. Our study focuses on transformer-based LLMs, specifically targeting the computationally intensive feedforward networks (FFNs) We show that wide and structured networks can utilize training FLOPs more efficiently, with fewer parameters and lower loss than dense models at their optimal trade-off.
arXiv Detail & Related papers (2024-06-24T08:43:21Z)
Large Language Model Agent as a Mechanical Designer [7.136205674624813]
In this study, we present a novel approach that integrates pre-trained LLMs with a FEM module. The FEM module evaluates each design and provides essential feedback, guiding the LLMs to continuously learn, plan, generate, and optimize designs without the need for domain-specific training. Our results reveal that these LLM-based agents can successfully generate truss designs that comply with natural language specifications with a success rate of up to 90%, which varies according to the applied constraints.
arXiv Detail & Related papers (2024-04-26T16:41:24Z)
Structure-aware Fine-tuning for Code Pre-trained Models [30.989863310409568]
We present Structure-aware Fine-tuning (SAT), a structure-enhanced and plug-and-play fine-tuning method for CodePTMs. We first propose a structure loss to quantify the difference between the information learned by CodePTMs and the knowledge extracted from code structure. We then introduce multi-task learning to improve the performance of fine-tuning.
arXiv Detail & Related papers (2024-04-11T04:24:48Z)
Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning [13.371405067535814]
This paper investigates the effectiveness ofSupervised Fine-Tuning (SFT) as a method for knowledge injection in Large Language Models (LLMs) We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information. Our results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge.
arXiv Detail & Related papers (2024-03-30T01:56:07Z)
Unifying Structure and Language Semantic for Efficient Contrastive Knowledge Graph Completion with Structured Entity Anchors [0.3913403111891026]
The goal of knowledge graph completion (KGC) is to predict missing links in a KG using trained facts that are already known. We propose a novel method to effectively unify structure information and language semantics without losing the power of inductive reasoning.
arXiv Detail & Related papers (2023-11-07T11:17:55Z)
StructGPT: A General Framework for Large Language Model to Reason over Structured Data [117.13986738340027]
We develop an emphIterative Reading-then-Reasoning(IRR) approach for solving question answering tasks based on structured data. Our approach can significantly boost the performance of ChatGPT and achieve comparable performance against the full-data supervised-tuning baselines.
arXiv Detail & Related papers (2023-05-16T17:45:23Z)
PDSketch: Integrated Planning Domain Programming and Learning [86.07442931141637]
We present a new domain definition language, named PDSketch. It allows users to flexibly define high-level structures in the transition models. Details of the transition model will be filled in by trainable neural networks.
arXiv Detail & Related papers (2023-03-09T18:54:12Z)
Learning the Finer Things: Bayesian Structure Learning at the Instantiation Level [0.0]
Successful machine learning methods require a trade-off between memorization and generalization. We present a novel probabilistic graphical model structure learning approach that can learn, generalize and explain in elusive domains.
arXiv Detail & Related papers (2023-03-08T02:31:49Z)
Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect. There always exists severe domain shift between the pretraining and downstream ABSA datasets. We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z)
Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn. We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.