Fugu-MT 論文翻訳(概要): K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs

論文の概要: K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs

arxiv url: http://arxiv.org/abs/2605.09635v1
Date: Sun, 10 May 2026 16:24:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 02:24:05.53882
Title: K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs
Title（参考訳）: K12-KGraph: ベンチマークと教育用LLMのためのカリキュラム対応知識グラフ
Authors: Hao Liang, Qihan Lin, Zhaoyang Han, Xiaochen Ma, Zhen Hao Wong, Meiyi Qiang, Linzhuang Sun, Wentao Zhang,
Abstract要約: K12-KGraphは,教育出版の教科書から抽出したカリキュラムに整合した知識グラフである。このグラフには7つのノードタイプ(概念、スキル、実験、セクション、章、書籍)と、分類、前提条件、関連性、検証、位置、順序を含む9つの関係タイプが含まれている。 K12-Benchでは、Gemini-3-Flashは57%の正確なマッチングしか達成せず、最高のオープンソースモデルであるGemma-4-31B-ITは46%に達した。
参考スコア（独自算出の注目度）: 13.794369477415293
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models (LLMs) are increasingly used in K-12 education, yet existing benchmarks such as C-Eval, CMMLU, GaokaoBench, and EduEval mainly evaluate factual recall through exam-style question answering. Effective educational AI additionally requires curriculum cognition: understanding how knowledge is structured through prerequisite chains, concept taxonomies, experiment-concept links, and pedagogical sequencing. To address this gap, we introduce K12-KGraph, a curriculum-aligned knowledge graph extracted from official People's Education Press textbooks across mathematics, physics, chemistry, and biology from primary to high school. The graph contains seven node types (Concept, Skill, Experiment, Exercise, Section, Chapter, Book) and nine relation types covering taxonomy, prerequisite, association, verification, assessment, location, and order. Based on this graph, we construct two resources: (1) K12-Bench, a 23,640-question multi-select benchmark spanning five graph-derived task families (Ground, Prereq, Neighbor, Evidence, and Locate); and (2) K12-Train, a KG-guided supervised fine-tuning corpus of approximately 2,300 QA pairs synthesized from graph structure and node attributes. Experiments reveal substantial deficiencies in curriculum cognition: on K12-Bench, Gemini-3-Flash achieves only 57% exact match, while the best open-source model, Gemma-4-31B-IT, reaches 46%. Under a strictly matched 2,300-sample SFT budget on Qwen3-4B-Base and Llama-3.1-8B-Base, K12-Train consistently outperforms equally sized subsets from eight mainstream instruction-tuning corpora on both GaokaoBench and EduEval, demonstrating that curriculum-structured supervision is highly sample-efficient for educational tuning. We release the graph, benchmark, training data, and full construction pipeline.
Abstract（参考訳）: 大規模言語モデル (LLM) は、K-12教育での利用が増えているが、C-Eval、CMMLU、GaokaoBench、EduEvalといった既存のベンチマークは、主に試験スタイルの質問応答を通じて事実的リコールを評価する。さらに効果的な教育AIはカリキュラムの認知を必要とする: 知識が前提となる連鎖、概念分類、実験と概念のリンク、教育的なシークエンシングを通じてどのように構成されているかを理解する。このギャップに対処するために、小学校から高校までの数学、物理学、化学、生物学に関する公式のピープルズ・エデュケーション・プレス教科書から抽出したカリキュラム対応の知識グラフであるK12-KGraphを紹介した。このグラフには7つのノードタイプ(概念、スキル、実験、運動、セクション、章、書籍)と9つの関係タイプがあり、分類、前提条件、関連性、検証、評価、位置、順序を含んでいる。このグラフに基づいて、(1)K12-Bench、(Ground, Prereq, Neighbor, Evidence, Locate)5つのグラフ由来のタスクファミリにまたがる23,640のマルチセレクションベンチマーク、(2)K12-Train、(グラフ構造とノード属性から合成された約2,300QAペアのKG誘導微調整コーパスを構築する。 K12-Benchでは、Gemini-3-Flashは57%の正確なマッチングしか達成せず、最高のオープンソースモデルであるGemma-4-31B-ITは46%に達した。 Qwen3-4B-BaseとLlama-3.1-8B-Baseの2,300サンプルのSFT予算の下では、K12-Trainはガオカオベンチとエドゥエヴァルの8つのメインストリームの教育訓練コーパスから同じ大きさのサブセットを一貫して上回り、カリキュラム構造化の監督は教育のチューニングに非常に有効であることを示した。グラフ、ベンチマーク、トレーニングデータ、完全な構築パイプラインをリリースしています。

論文の概要: K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs

関連論文リスト