Fugu-MT 論文翻訳(概要): Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning

論文の概要: Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning

arxiv url: http://arxiv.org/abs/2506.07851v1
Date: Mon, 09 Jun 2025 15:16:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-10 16:33:11.017942
Title: Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning
Title（参考訳）: 集中学習:グラディエントガイドトケンプルーニングによる因果注意蒸留
Authors: Yiju Guo, Wenkai Yang, Zexu Sun, Ning Ding, Zhiyuan Liu, Yankai Lin,
Abstract要約: 大規模言語モデル (LLM) は文脈理解において著しく改善されている。しかし、長いコンテキストの推論と生成の間に真に重要な情報に出席する能力は、まだペースの遅れています。本稿では,2段階のフレームワークであるLearning to Focus(LeaF)を導入し,コンバウンディング要因を緩和する。
参考スコア（独自算出の注目度）: 47.764552063499046
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have demonstrated significant improvements in contextual understanding. However, their ability to attend to truly critical information during long-context reasoning and generation still falls behind the pace. Specifically, our preliminary experiments reveal that certain distracting patterns can misdirect the model's attention during inference, and removing these patterns substantially improves reasoning accuracy and generation quality. We attribute this phenomenon to spurious correlations in the training data, which obstruct the model's capacity to infer authentic causal instruction-response relationships. This phenomenon may induce redundant reasoning processes, potentially resulting in significant inference overhead and, more critically, the generation of erroneous or suboptimal responses. To mitigate this, we introduce a two-stage framework called Learning to Focus (LeaF) leveraging intervention-based inference to disentangle confounding factors. In the first stage, LeaF employs gradient-based comparisons with an advanced teacher to automatically identify confounding tokens based on causal relationships in the training corpus. Then, in the second stage, it prunes these tokens during distillation to enact intervention, aligning the student's attention with the teacher's focus distribution on truly critical context tokens. Experimental results demonstrate that LeaF not only achieves an absolute improvement in various mathematical reasoning and code generation benchmarks but also effectively suppresses attention to confounding tokens during inference, yielding a more interpretable and reliable reasoning model.
Abstract（参考訳）: 大規模言語モデル (LLM) は文脈理解において著しく改善されている。しかし、長いコンテキストの推論と生成の間に真に重要な情報に出席する能力は、まだペースの遅れています。具体的には、予備実験により、ある注意パターンが推論中のモデルの注意を誤って誘導し、これらのパターンを除去することで、推論精度と生成品質が大幅に向上することを示した。我々は,この現象を,モデルが真の因果的指示応答関係を推測する能力を阻害する訓練データにおいて,突発的な相関関係とみなす。この現象は冗長な推論過程を誘発し、大きな推測オーバーヘッドを生じさせ、さらに重要なことは誤反応や準最適反応の発生を引き起こす可能性がある。これを軽減するために,介入に基づく推論を活かしたLearning to Focus(LeaF)と呼ばれる2段階のフレームワークを導入する。第一段階では、LeaFは高度な教師と勾配に基づく比較を用いて、訓練コーパスの因果関係に基づいて共起トークンを自動的に識別する。そして、第2段階では、これらのトークンを蒸留して介入を実施させ、学生の注意を真に重要な文脈トークンに向ける教師の焦点分布と整合させる。実験結果から、LeaFは様々な数学的推論やコード生成ベンチマークにおいて絶対的な改善を達成できるだけでなく、推論中のトークンの発見に対する注意を効果的に抑制し、より解釈可能で信頼性の高い推論モデルが得られることが示された。

関連論文リスト

A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models [53.18562650350898]
思考の連鎖(CoT)推論は、大きな言語モデルの性能を高める。大規模視覚言語モデルにおけるCoT忠実度に関する最初の総合的研究について述べる。
論文参考訳（メタデータ） (2025-05-29T18:55:05Z)
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs [16.659986373052217]
連鎖推論は命令追従精度を著しく低下させる。これは、推論によって引き起こされる命令追従の失敗を体系的に公開する最初の作業である。
論文参考訳（メタデータ） (2025-05-16T16:36:00Z)
Don't Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models [32.71672086718058]
CoT (Few-shot Chain-of-Thought) は大規模言語モデル (LLM) の推論能力を著しく向上させる我々は、COTのデモで分離されたセグメント、単語、トークンが、予期せずLCMの生成過程を乱す可能性があることを観察する。デモの注意パターンを動的に解析し,これらのトークンを正確に識別するFew-shot Attention Intervention法(FAI)を提案する。
論文参考訳（メタデータ） (2025-03-14T07:46:33Z)
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) は、Large Language Models (LLM) を用いた自然言語処理の主要な手法となっている。本研究は,低アグリゲーション,異質なアノテーションを組み合わせたアグリゲーションの結果が,プロンプトに有害なノイズを生じさせるアノテーションのアーティファクトに繋がるかどうかを考察する。この結果から,アグリゲーションは主観的タスクのモデル化において不明瞭な要因であり,代わりに個人をモデリングすることを重視することが示唆された。
論文参考訳（メタデータ） (2024-10-17T17:16:00Z)
Towards Context-Aware Emotion Recognition Debiasing from a Causal Demystification Perspective via De-confounded Training [14.450673163785094]
文脈認識感情認識(CAER)は、対象者の感情を認識するための貴重な意味的手がかりを提供する。現在のアプローチは、コンテキストから知覚的に重要な表現を抽出する洗練された構造を設計することに集中している。共同設立者を非難するためのCCIM(Contextual Causal Intervention Module)を提案する。
論文参考訳（メタデータ） (2024-07-06T05:29:02Z)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
大規模言語モデル(LLM)が,その世代を理論的にどのように説明するかを考察する。提案手法は帰属に基づく説明よりも「偽り」が少ないことを示す。
論文参考訳（メタデータ） (2024-06-28T20:06:30Z)
Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers [49.80959223722325]
本研究では,大規模言語モデルにおけるフィードフォワード層とアテンション層との区別について検討する。フィードフォワード層はビッグラムのような単純な分布関係を学習する傾向があり、注意層は文脈内推論にフォーカスする。
論文参考訳（メタデータ） (2024-06-05T08:51:08Z)
Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning [11.13665894783481]
因果不透明性(英: Causal opacity)とは、ディープニューラルネットワーク(DNN)モデルの決定の根底にある「隠れた」因果構造を理解することの難しさを指す。この研究は、因果概念グラフモデル(Causal Concept Graph Models, Causal CGMs)を導入している。実験の結果, (i) 因果不透明モデルの一般化性能に一致し, (ii) ループ内修正を誤予測中間推論ステップに適用し, (iii) 介入シナリオと反事実シナリオの分析を支援することができた。
論文参考訳（メタデータ） (2024-05-26T10:15:20Z)
Identifying Semantic Induction Heads to Understand In-Context Learning [103.00463655766066]
自然言語に存在するトークン間の2種類の関係を,注目ヘッドが符号化するかどうかを検討する。特定の注意ヘッドは、ヘッドトークンに出席する際、テールトークンをリコールし、テールトークンの出力ロジットを増加させるパターンを示す。
論文参考訳（メタデータ） (2024-02-20T14:43:39Z)
Unveiling the Magic: Investigating Attention Distillation in Retrieval-augmented Generation [8.363702038073814]
Retrieval-augmented generation frameworkは、より正確な回答のためにリアルタイムの知識更新を可能にすることで、大規模言語モデルの限界に対処することができる。検索強化モデルの学習段階における効率的な方法は、注意点を手動で注釈付けされたクエリ文書ペアの代わりに監督信号として利用する注意蒸留である。
論文参考訳（メタデータ） (2024-02-19T02:48:44Z)
Using Early Readouts to Mediate Featural Bias in Distillation [30.5299408494168]
ディープネットワークは、現実世界の教師付き学習タスクにおいて、突発的な特徴ラベル相関を学習する傾向がある。本稿では,従来のネットワーク層からの表現を用いてラベルを予測しようとする新しい早期読み出し機構を提案する。
論文参考訳（メタデータ） (2023-10-28T04:58:15Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。