Fugu-MT 論文翻訳(概要): C$^2$DLM: Causal Concept-Guided Diffusion Large Language Models

論文の概要: C$^2$DLM: Causal Concept-Guided Diffusion Large Language Models

arxiv url: http://arxiv.org/abs/2511.22146v1
Date: Thu, 27 Nov 2025 06:33:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-01 19:47:55.424171
Title: C$^2$DLM: Causal Concept-Guided Diffusion Large Language Models
Title（参考訳）: C$^2$DLM:Causal Concept-Guided Diffusion Large Language Models
Authors: Kairong Han, Nuanqiao Shan, Ziyu Zhao, Zijing Hu, Xinpeng Dong, Junjian Ye, Lujia Pan, Fei Wu, Kun Kuang,
Abstract要約: 自己回帰(AR)言語モデルと拡散言語モデル(DLM)は、大きな言語モデルの2つの主要なパラダイムを構成する。本稿では,UnderlinetextbfCausal underlinetextbfConcept-Guided UnderlinetextbfDiffusionを提案する。 C$2$DLMはまず教師モデルから概念レベルの因果グラフを取得し、その後、概念間の因果関係の学習に注意を向ける。
参考スコア（独自算出の注目度）: 43.03880420745772
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autoregressive (AR) language models and Diffusion Language Models (DLMs) constitute the two principal paradigms of large language models. However, both paradigms suffer from insufficient reasoning capabilities. Human reasoning inherently relies on causal knowledge and thought, which are reflected in natural language. But in the AR paradigm, language is modeled as next token prediction (a strictly left-to-right, token-by-token order), whereas natural language itself exhibits more flexible causal structures. In the DLM paradigm, the attention mechanism is fully connected, which entirely disregards causal order. To fill this gap, we propose a \underline{\textbf{C}}ausal \underline{\textbf{C}}oncept-Guided \underline{\textbf{D}}iffusion \underline{\textbf{L}}anguage \underline{\textbf{M}}odel (C$^2$DLM). Starting from DLM's fully connected attention, C$^2$DLM first obtains a concept-level causal graph from the teacher model, and then explicitly guides attention to learn causal relationships between concepts. By focusing on causal relationships and avoiding interference from difficult subgoals involving causal inversion, C$^2$DLM improves 12\% with about 3.2 times training speedup in the COT-OrderPerturb task, and achieves an average gain of 1.31\% across six downstream reasoning tasks. More details in the repository ~\href{https://github.com/Kairong-Han/C-2-DLM}{here}.
Abstract（参考訳）: 自己回帰(AR)言語モデルと拡散言語モデル(DLM)は、大きな言語モデルの2つの主要なパラダイムを構成する。しかし、どちらのパラダイムも推論能力の不足に悩まされている。人間の推論は本質的には、自然言語に反映される因果的知識と思考に依存している。しかし、ARパラダイムでは、言語は次のトークン予測(厳密には左から右へのトークン・バイ・トークンの順序)としてモデル化される。 DLMパラダイムでは、注意機構は完全に接続されており、因果順序を完全に無視している。このギャップを埋めるために、 \underline{\textbf{C}}ausal \underline{\textbf{C}}oncept-Guided \underline{\textbf{D}}iffusion \underline{\textbf{L}}anguage \underline{\textbf{M}}odel (C$^2$DLM)を提案する。 C$^2$DLMは、DLMが完全に結びついた注意から、まず教師モデルから概念レベルの因果グラフを取得し、その後、概念間の因果関係を明示的に学習するために注意を誘導する。 C$^2$DLMは因果関係に焦点を合わせ、因果逆転を伴う困難なサブゴールからの干渉を避けることで、COT-OrderPerturbタスクの約3.2倍のトレーニングスピードアップを12倍に改善し、6つの下流推論タスクで平均1.31倍のアップを達成する。詳細はリポジトリ ~\href{https://github.com/Kairong-Han/C-2-DLM}{here} にある。

論文の概要: C$^2$DLM: Causal Concept-Guided Diffusion Large Language Models

関連論文リスト