Fugu-MT 論文翻訳(概要): DPN-LE: Dual Personality Neuron Localization and Editing for Large Language Models

論文の概要: DPN-LE: Dual Personality Neuron Localization and Editing for Large Language Models

arxiv url: http://arxiv.org/abs/2604.27929v1
Date: Thu, 30 Apr 2026 14:31:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:54.138178
Title: DPN-LE: Dual Personality Neuron Localization and Editing for Large Language Models
Title（参考訳）: DPN-LE:大言語モデルのための二重パーソナリティニューロンの局在と編集
Authors: Lifan Zheng, Xue Yang, Jiawei Chen, Chenyan Wu, Jingyuan Zhang, Fanheng Kong, Xinyi Zeng, Xiang Chen, Yu Tian,
Abstract要約: 現在の方法は個性を変えることができるが、全体的なパフォーマンスは低下する。ニューロンは多機能であり、性格特性と一般的な知識を結びつける。本研究では,高トレートおよび低トレート標本間のステアリング活性化を対比することにより,パーソナリティ特異的ニューロンを識別するDPN-LEを提案する。
参考スコア（独自算出の注目度）: 25.763216553110386
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the widespread adoption of large language models (LLMs), understanding their personality representation mechanisms has become critical. As a novel paradigm in Personality Editing, most existing methods employ neuron-editing to locate and modify LLM neurons, requiring changes to numerous neurons and leading to significant performance degradation. This raises a fundamental question: Are all modified neurons directly related to personality representation? In this work, we investigate and quantify this specificity through assessments of general capability impact and representation-level patterns. We find that: 1) Current methods can change personalities but reduce overall performance. 2) Neurons are multifunctional, connecting personality traits and general knowledge. 3) Opposing personality traits demonstrate distinctly mutually exclusive representation patterns. Motivated by these findings, we propose DPN-LE (Dual Personality Neuron Localization and Editing), which identifies personality-specific neurons by contrasting MLP activations between high-trait and low-trait samples. DPN-LE constructs layer-wise steering vectors and applies dual-criterion filtering based on Cohen's $d$ effect size and activation magnitude to isolate mutually exclusive neuron subsets. Sparse linear intervention on these neurons enables precise personality control at inference time. Using only 1,000 contrastive sample pairs per trait, DPN-LE intervenes on $\sim$0.5\% of neurons while achieving competitive personality control and substantially better capability preservation across reasoning tasks. Experiments on LLaMA-3-8B-Instruct and Qwen2.5-7B-Instruct demonstrate the effectiveness and generalizability of our approach.
Abstract（参考訳）: 大規模言語モデル (LLM) の普及により, 人格表現機構の理解が重要になっている。パーソナリティ編集における新しいパラダイムとして、既存のほとんどの手法ではLLMニューロンの発見と修正にニューロン編集を採用しており、多くのニューロンの変更が必要であり、性能が著しく低下する。すべての修飾ニューロンは、パーソナリティ表現に直接関連しているのか? 本研究は,汎用能力の影響評価と表現レベルパターンを用いて,この特異性を検証し,定量化するものである。以下に示す。 1)現在の手法は個性を変えることができるが、全体的な性能は低下する。 2)ニューロンは多機能であり,性格特性と一般知識を結びつける。 3) 性格特性の反対は, 相互に排他的な表現パターンを示す。これらの結果からDPN-LE(Dual Personality Neuron Localization and Editing)を提案する。 DPN-LEは、相互排他的なニューロンサブセットを分離するために、Cohenの$d$効果サイズとアクティベーションサイズに基づいて、レイヤーワイズステアリングベクターを構築し、二重基準フィルタリングを適用する。これらのニューロンへの疎線形干渉は、推論時に正確なパーソナリティ制御を可能にする。 DPN-LEは1形質あたり1,000対の対照的なサンプルペアしか使用せず、$\sim$0.5\%のニューロンに介入し、競争性のあるパーソナリティコントロールを達成し、推論タスク全体の能力保存を大幅に改善する。 LLaMA-3-8B-インストラクトとQwen2.5-7B-インストラクトの実験により,本手法の有効性と一般化性を示した。

論文の概要: DPN-LE: Dual Personality Neuron Localization and Editing for Large Language Models

関連論文リスト