Fugu-MT 論文翻訳(概要): From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models

論文の概要: From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models

arxiv url: http://arxiv.org/abs/2604.25167v1
Date: Tue, 28 Apr 2026 03:16:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-29 16:49:17.692131
Title: From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models
Title（参考訳）: InsightからActionへ:大規模言語モデルにおける解釈可能性に基づくデータ選択のための新しいフレームワーク
Authors: Ling Shi, Xinwei Wu, Xiaohu Zhao, Hao Wang, Heng Liu, Yangyang Liu, Linlong Xu, Longyue Wang, Deyi Xiong, Weihua Luo,
Abstract要約: Interpretability-Guided Data Selection (IGDS) は、まず周波数リコールと干渉フィルタリングによって因果タスクの特徴を識別するフレームワークである。我々は,数学的推論,要約,翻訳タスクに関するIGDSをGemma-2,LLaMA-3.1,Qwen3モデルで検証する。
参考スコア（独自算出の注目度）: 73.72877445629383
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While mechanistic interpretability tools like Sparse Autoencoders (SAEs) can uncover meaningful features within Large Language Models (LLMs), a critical gap remains in transforming these insights into practical actions for model optimization. We bridge this gap with the hypothesis that data selection guided by a model's internal task features is a effective training strategy. Inspired by this, we propose Interpretability-Guided Data Selection (IGDS), a framework that first identifies these causal task features through frequency recall and interventional filtering, then selects ``Feature-Resonant Data'' that maximally activates task features for fine-tuning. We validate IGDS on mathematical reasoning, summarization, and translation tasks within Gemma-2, LLaMA-3.1, and Qwen3 models. Our experiments demonstrate exceptional data efficiency: on the Math task, IGDS surpasses full-dataset fine-tuning by a remarkable 17.4% on Gemma-2-2B while using only 50% of the data, and outperforms established baselines focused on data quality and diversity. Analysis confirms a strong positive correlation between feature amplification and task performance improvement. IGDS thus provides a direct and effective framework to enhance LLMs by leveraging their internal mechanisms, validating our core hypothesis.
Abstract（参考訳）: Sparse Autoencoders (SAEs)のような機械的解釈可能性ツールは、Large Language Models (LLMs)内で有意義な特徴を明らかにすることができるが、これらの洞察をモデル最適化のための実践的なアクションに変換する上で重要なギャップは残る。このギャップを、モデルの内部タスク特徴によって導かれるデータ選択が効果的なトレーニング戦略であるという仮説で埋める。そこで我々は,まず,これらの因果的タスクの特徴を周波数リコールと干渉フィルタリングによって識別するフレームワークである Interpretability-Guided Data Selection (IGDS) を提案し,次に,微調整のためのタスク機能を最大限に活性化する ``Feature-Resonant Data'' を選択する。我々は,数学的推論,要約,翻訳タスクに関するIGDSをGemma-2,LLaMA-3.1,Qwen3モデルで検証する。実験では,データ品質と多様性に焦点をあてた定評あるベースラインをわずか50%使用しながら,全データセットの微調整をGemma-2-2Bで17.4%上回った。分析は、特徴増幅とタスクパフォーマンス改善の強い正の相関を裏付ける。 IGDSは、内部メカニズムを活用し、コア仮説を検証することでLCMを強化するための、直接的で効果的なフレームワークを提供する。

論文の概要: From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models

関連論文リスト