Fugu-MT 論文翻訳(概要): Interpret the Internal States of Recommendation Model with Sparse Autoencoder

論文の概要: Interpret the Internal States of Recommendation Model with Sparse Autoencoder

arxiv url: http://arxiv.org/abs/2411.06112v1
Date: Sat, 09 Nov 2024 08:22:31 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-28 17:07:46.065012
Title: Interpret the Internal States of Recommendation Model with Sparse Autoencoder
Title（参考訳）: スパースオートエンコーダを用いた勧告モデルの内部状態の解釈
Authors: Jiayin Wang, Xiaoyu Zhang, Weizhi Ma, Min Zhang,
Abstract要約: RecSAEは、レコメンデーションモデルの内部状態を解釈するための、自動で一般化可能な探索手法である。我々は、推薦モデルの内部アクティベーションを再構築するために、疎度制約付きオートエンコーダを訓練する。我々は、潜在活性化と入力項目列の関係に基づき、概念辞書の構築を自動化した。
参考スコア（独自算出の注目度）: 26.021277330699963
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Explainable recommendation systems are important to enhance transparency, accuracy, and fairness. Beyond result-level explanations, model-level interpretations can provide valuable insights that allow developers to optimize system designs and implement targeted improvements. However, most current approaches depend on specialized model designs, which often lack generalization capabilities. Given the various kinds of recommendation models, existing methods have limited ability to effectively interpret them. To address this issue, we propose RecSAE, an automatic, generalizable probing method for interpreting the internal states of Recommendation models with Sparse AutoEncoder. RecSAE serves as a plug-in module that does not affect original models during interpretations, while also enabling predictable modifications to their behaviors based on interpretation results. Firstly, we train an autoencoder with sparsity constraints to reconstruct internal activations of recommendation models, making the RecSAE latents more interpretable and monosemantic than the original neuron activations. Secondly, we automated the construction of concept dictionaries based on the relationship between latent activations and input item sequences. Thirdly, RecSAE validates these interpretations by predicting latent activations on new item sequences using the concept dictionary and deriving interpretation confidence scores from precision and recall. We demonstrate RecSAE's effectiveness on two datasets, identifying hundreds of highly interpretable concepts from pure ID-based models. Latent ablation studies further confirm that manipulating latent concepts produces corresponding changes in model output behavior, underscoring RecSAE's utility for both understanding and targeted tuning recommendation models. Code and data are publicly available at https://github.com/Alice1998/RecSAE.
Abstract（参考訳）: 説明可能なレコメンデーションシステムは、透明性、正確性、公平性を高めるために重要である。結果レベルの説明以外にも、モデルレベルの解釈は、開発者がシステム設計を最適化し、ターゲットとする改善を実装するための貴重な洞察を提供することができる。しかし、現在のほとんどのアプローチは、一般化能力に欠ける特殊なモデル設計に依存している。様々なレコメンデーションモデルを考えると、既存の手法はそれらを効果的に解釈する能力に制限がある。本稿では,Recommendationモデルの内部状態をスパースオートエンコーダで解釈するRecSAEを提案する。 RecSAEは、解釈中にオリジナルのモデルに影響を与えないプラグインモジュールとして機能し、解釈結果に基づいて振る舞いを予測可能な修正を可能にする。まず,レコメンデーションモデルの内部アクティベーションを再構築するために,余剰制約のあるオートエンコーダを訓練し,RecSAE潜伏剤を元のニューロンアクティベーションよりも解釈し,単意味にする。第2に、潜在活性化と入力項目列の関係に基づき、概念辞書の構築を自動化した。第3に、RecSAEは、概念辞書を用いて新しい項目列上で潜在活性化を予測し、精度とリコールから解釈信頼スコアを導出することにより、これらの解釈を検証する。 RecSAEの有効性を2つのデータセットで示し、純粋なIDベースモデルから数百の高度に解釈可能な概念を識別する。潜時アブレーション研究により、潜時概念を操作するとモデル出力の振る舞いが変化することが確認され、RecSAEの実用性を理解モデルとターゲットチューニングレコメンデーションモデルの両方に当てはめている。コードとデータはhttps://github.com/Alice1998/RecSAEで公開されている。

関連論文リスト

LatentQA: Teaching LLMs to Decode Activations Into Natural Language [72.87064562349742]
自然言語におけるモデルアクティベーションに関するオープンな疑問に答えるタスクであるLatentQAを紹介する。本稿では,アクティベーションと関連する質問応答ペアのデータセット上で,デコーダLLMを微調整するLatent Interpretation Tuning (LIT)を提案する。我々のデコーダはまた、ステレオタイプ付き文のモデルのデバイアス化や世代ごとの感情制御など、モデルを制御するために使用する差別化可能な損失も規定している。
論文参考訳（メタデータ） (2024-12-11T18:59:33Z)
Self-supervised Interpretable Concept-based Models for Text Classification [9.340843984411137]
本稿では,自己教師型解釈可能な概念埋め込みモデル(ICEM)を提案する。我々は,大規模言語モデルの一般化能力を活用し,概念ラベルを自己管理的に予測する。 ICEMは、完全に教師されたコンセプトベースモデルやエンドツーエンドのブラックボックスモデルと同じようなパフォーマンスを達成するために、自己管理的な方法でトレーニングすることができる。
論文参考訳（メタデータ） (2024-06-20T14:04:53Z)
RecExplainer: Aligning Large Language Models for Explaining Recommendation Models [50.74181089742969]
大規模言語モデル (LLM) は、理解、推論、指導において顕著な知性を示した。本稿では, ブラックボックスレコメンデータモデルを説明するために, LLM を代理モデルとして利用することについて検討する。効果的なアライメントを容易にするために,行動アライメント,意図アライメント,ハイブリッドアライメントという3つの手法を導入する。
論文参考訳（メタデータ） (2023-11-18T03:05:43Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
本稿では,コード用大規模言語モデルに特有の説明可能性手法であるASTxplainerを紹介する。その中核にあるASTxplainerは、トークン予測をASTノードに整合させる自動メソッドを提供する。私たちは、最も人気のあるGitHubプロジェクトのキュレートデータセットを使用して、コード用の12の人気のあるLLMに対して、実証的な評価を行います。
論文参考訳（メタデータ） (2023-08-07T18:50:57Z)
Disentanglement via Latent Quantization [60.37109712033694]
本研究では,組織化された潜在空間からの符号化と復号化に向けた帰納的バイアスを構築する。本稿では,基本データレコーダ (vanilla autoencoder) と潜時再構成 (InfoGAN) 生成モデルの両方に追加することで,このアプローチの広範な適用性を実証する。
論文参考訳（メタデータ） (2023-05-28T06:30:29Z)
Interpretable Sentence Representation with Variational Autoencoders and Attention [0.685316573653194]
自然言語処理(NLP)における近年の表現学習技術の解釈可能性を高める手法を開発した。変動オートエンコーダ (VAEs) は, 遅延生成因子の観測に有効である。帰納的バイアスを持つ2つのモデルを構築し、潜在表現の情報を注釈付きデータなしで理解可能な概念に分離する。
論文参考訳（メタデータ） (2023-05-04T13:16:15Z)
Explaining Language Models' Predictions with High-Impact Concepts [11.47612457613113]
概念ベースの解釈可能性手法をNLPに拡張するための完全なフレームワークを提案する。出力予測が大幅に変化する特徴を最適化する。本手法は, ベースラインと比較して, 予測的影響, ユーザビリティ, 忠実度に関する優れた結果が得られる。
論文参考訳（メタデータ） (2023-05-03T14:48:27Z)
ProtoVAE: A Trustworthy Self-Explainable Prototypical Variational Model [18.537838366377915]
ProtoVAEは、クラス固有のプロトタイプをエンドツーエンドで学習する変分自動エンコーダベースのフレームワークである。表現空間を正規化し、正則性制約を導入することにより、信頼性と多様性を強制する。
論文参考訳（メタデータ） (2022-10-15T00:42:13Z)
Combining Discrete Choice Models and Neural Networks through Embeddings: Formulation, Interpretability and Performance [10.57079240576682]
本研究では、ニューラルネットワーク(ANN)を用いた理論とデータ駆動選択モデルを組み合わせた新しいアプローチを提案する。特に、分類的または離散的説明変数を符号化するために、埋め込みと呼ばれる連続ベクトル表現を用いる。我々のモデルは最先端の予測性能を提供し、既存のANNモデルよりも優れ、必要なネットワークパラメータの数を劇的に削減します。
論文参考訳（メタデータ） (2021-09-24T15:55:31Z)
InteL-VAEs: Adding Inductive Biases to Variational Auto-Encoders via Intermediary Latents [60.785317191131284]
本稿では,潜伏変数の中間集合を用いて,制御可能なバイアスでVAEを学習するための簡易かつ効果的な手法を提案する。特に、学習した表現に対して、スパーシリティやクラスタリングといった望ましいプロパティを課すことができます。これにより、InteL-VAEはより優れた生成モデルと表現の両方を学ぶことができる。
論文参考訳（メタデータ） (2021-06-25T16:34:05Z)
Autoencoding Variational Autoencoder [56.05008520271406]
我々は,この行動が学習表現に与える影響と,自己整合性の概念を導入することでそれを修正する結果について検討する。自己整合性アプローチで訓練されたエンコーダは、敵攻撃による入力の摂動に対して頑健な(無神経な)表現につながることを示す。
論文参考訳（メタデータ） (2020-12-07T14:16:14Z)
Explaining and Improving Model Behavior with k Nearest Neighbor Representations [107.24850861390196]
モデルの予測に責任のあるトレーニング例を特定するために, k 近傍表現を提案する。我々は,kNN表現が学習した素因関係を明らかにするのに有効であることを示す。以上の結果から,kNN手法により,直交モデルが逆入力に対してより堅牢であることが示唆された。
論文参考訳（メタデータ） (2020-10-18T16:55:25Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。