Fugu-MT 論文翻訳(概要): KA2L: A Knowledge-Aware Active Learning Framework for LLMs

論文の概要: KA2L: A Knowledge-Aware Active Learning Framework for LLMs

arxiv url: http://arxiv.org/abs/2603.17566v1
Date: Wed, 18 Mar 2026 10:16:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.639096
Title: KA2L: A Knowledge-Aware Active Learning Framework for LLMs
Title（参考訳）: KA2L - LLMのための知識対応アクティブラーニングフレームワーク
Authors: Haoxuan Yin, Bojian Liu, Chen Tang, Yangfan Wang, Lian Yan, Jingchi Jiang,
Abstract要約: 本研究では,大規模言語モデル(LLM)によるドメイン固有知識理解の深度について検討する。本稿では,無知な質問を潜時空間分析によって構築するための知識認識型アクティブラーニングフレームワークを提案する。その結果、KA2Lは2つのオープンドメインと1つの垂直ドメインデータセットに対して、アノテーションとコストを50%削減するだけでなく、大幅に削減できることが示唆された。
参考スコア（独自算出の注目度）: 11.702131167960923
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-tuning large language models (LLMs) with high-quality knowledge has been shown to enhance their performance effectively. However, there is a paucity of research on the depth of domain-specific knowledge comprehension by LLMs and the application of targeted active learning to improve their expertise. To address this gap, we introduce the Knowledge-Aware Active Learning (KA2L) framework. This framework assesses LLMs' mastery of specific knowledge points to aid in constructing unanswerable or unknowable questions through latent space analysis. This active learning strategy enhances training efficiency by focusing on knowledge the model has yet to master, thereby minimizing redundancy in learning already acquired information. This study innovatively employs a knowledge distribution probing technique to examine the hidden states of specific Transformer layers and identify the distribution of known and unknown knowledge within the LLM. Additionally, a hidden-state decoding method is proposed to generate numerous unknown questions in natural language from the latent knowledge space. In our experiments, we selected nine open-source LLMs to validate the effectiveness of the proposed framework. Results indicate that KA2L not only significantly reduces 50% annotation and computation costs across two open-domain and one vertical-domain dataset but also achieves better performance, offering valuable insights into active learning strategies for LLMs. The code is available at https://anonymous.4open.science/r/KA2L-F15C.
Abstract（参考訳）: 高品質な知識を持つ微調整型大規模言語モデル(LLM)は、その性能を効果的に向上することが示されている。しかし、LLMによるドメイン固有の知識理解の深みについての研究や、その専門性を改善するために対象とするアクティブラーニングの応用について、多くの研究がなされている。このギャップに対処するため、我々はKnowledge-Aware Active Learning (KA2L)フレームワークを紹介します。このフレームワークはLLMの特定の知識ポイントの熟達度を評価し、潜時空間解析を通じて解決不可能または不可解な質問を構築するのに役立つ。このアクティブな学習戦略は、モデルがまだ習得していない知識に焦点をあてることで、トレーニング効率を向上させる。本研究では,特定のトランスフォーマー層の隠れ状態を調べ,LLM内の未知の知識の分布を特定するために,知識分布探索手法を革新的に活用する。さらに,隠れ状態復号法を提案し,潜在知識空間から未知の疑問を自然言語で生成する。実験では,提案フレームワークの有効性を検証するため,9つのオープンソースLCMを選択した。その結果、KA2Lは2つのオープンドメインと1つの垂直ドメインデータセットにまたがるアノテーションと計算コストを50%削減するだけでなく、性能も向上し、LLMのアクティブな学習戦略に関する貴重な洞察を提供することがわかった。コードはhttps://anonymous.4open.science/r/KA2L-F15Cで公開されている。

論文の概要: KA2L: A Knowledge-Aware Active Learning Framework for LLMs

関連論文リスト