Fugu-MT 論文翻訳(概要): From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models

論文の概要: From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models

arxiv url: http://arxiv.org/abs/2604.17941v1
Date: Mon, 20 Apr 2026 08:21:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.762025
Title: From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models
Title（参考訳）: 頭部からニューロンへ:マルチタスク視覚言語モデルにおける因果属性とステアリング
Authors: Qidong Wang, Junjie Hu, Ming Jiang,
Abstract要約: HONESは、視覚言語モデルにおけるタスク認識ニューロンの属性とステアリングのための勾配のないフレームワークである。 HONESは,タスククリティカルニューロンの同定において既存の手法よりも優れており,ステアリング後のモデル性能が向上していることを示す。
参考スコア（独自算出の注目度）: 10.052877942432783
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work has increasingly explored neuron-level interpretation in vision-language models (VLMs) to identify neurons critical to final predictions. However, existing neuron analyses generally focus on single tasks, limiting the comparability of neuron importance across tasks. Moreover, ranking strategies tend to score neurons in isolation, overlooking how task-dependent information pathways shape the write-in effects of feed-forward network (FFN) neurons. This oversight can exacerbate neuron polysemanticity in multi-task settings, introducing noise into the identification and intervention of task-critical neurons. In this study, we propose HONES (Head-Oriented Neuron Explanation & Steering), a gradient-free framework for task-aware neuron attribution and steering in multi-task VLMs. HONES ranks FFN neurons by their causal write-in contributions conditioned on task-relevant attention heads, and further modulates salient neurons via lightweight scaling. Experiments on four diverse multimodal tasks and two popular VLMs show that HONES outperforms existing methods in identifying task-critical neurons and improves model performance after steering. Our source code is released at: https://github.com/petergit1/HONES.
Abstract（参考訳）: 近年、視覚言語モデル(VLM)におけるニューロンレベルの解釈を探索し、最終的な予測に不可欠なニューロンを同定する研究が増えている。しかし、既存のニューロン分析は一般に単一タスクに焦点を合わせ、タスク間でのニューロンの重要性の相違を制限している。さらに、ランク付け戦略は、タスク依存情報経路がフィードフォワード・ネットワーク(FFN)ニューロンの書き込み効果をいかに形成するかを見越して、神経細胞を単独でスコアする傾向にある。この監視は、タスククリティカルニューロンの識別と介入にノイズを導入し、マルチタスク環境でニューロンの多義性を悪化させる可能性がある。本研究では,マルチタスクVLMにおけるタスク認識型ニューロン属性とステアリングのための勾配のないフレームワークであるHONES(Head-Oriented Neuron Explanation & Steering)を提案する。 HONESは、タスク関連アテンションヘッドに条件付けられた因果書き込みによるFFNニューロンのランク付けを行い、さらに軽量なスケーリングによってサルエントニューロンを調節する。 4つの多様なマルチモーダルタスクと2つの人気のあるVLMの実験により、HONESはタスククリティカルニューロンの同定において既存の手法よりも優れており、ステアリング後のモデル性能が向上していることが示された。ソースコードは、https://github.com/petergit1/HONES.comで公開されています。

論文の概要: From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models

関連論文リスト