Fugu-MT 論文翻訳(概要): What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models

論文の概要: What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models

arxiv url: http://arxiv.org/abs/2601.20885v1
Date: Tue, 27 Jan 2026 22:31:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-30 16:22:49.349889
Title: What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models
Title（参考訳）: 大規模言語モデルに対するメンバーシップ推論攻撃に対する低信頼トークンの爆発的実行
Authors: Md Tasnim Jawad, Mingyan Xiao, Yanzhao Wu,
Abstract要約: メンバーシップ推論攻撃(MIA)は、特定のデータサンプルがモデルトレーニング/ファインチューニングデータセットに含まれるかどうかを判断しようとする。低信頼(ハード)トークンに対するトークンレベルの確率をキャプチャする新しいメンバシップ推論手法を提案する。ドメイン固有の医療データセットと汎用ベンチマークの両方の実験では、HT-MIAが7つの最先端MIAベースラインを一貫して上回っていることが示されている。
参考スコア（独自算出の注目度）: 2.621142288968429
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the widespread adoption of Large Language Models (LLMs) and increasingly stringent privacy regulations, protecting data privacy in LLMs has become essential, especially for privacy-sensitive applications. Membership Inference Attacks (MIAs) attempt to determine whether a specific data sample was included in the model training/fine-tuning dataset, posing serious privacy risks. However, most existing MIA techniques against LLMs rely on sequence-level aggregated prediction statistics, which fail to distinguish prediction improvements caused by generalization from those caused by memorization, leading to low attack effectiveness. To address this limitation, we propose a novel membership inference approach that captures the token-level probabilities for low-confidence (hard) tokens, where membership signals are more pronounced. By comparing token-level probability improvements at hard tokens between a fine-tuned target model and a pre-trained reference model, HT-MIA isolates strong and robust membership signals that are obscured by prior MIA approaches. Extensive experiments on both domain-specific medical datasets and general-purpose benchmarks demonstrate that HT-MIA consistently outperforms seven state-of-the-art MIA baselines. We further investigate differentially private training as an effective defense mechanism against MIAs in LLMs. Overall, our HT-MIA framework establishes hard-token based analysis as a state-of-the-art foundation for advancing membership inference attacks and defenses for LLMs.
Abstract（参考訳）: LLM(Large Language Models)の普及とプライバシー規制の厳格化により、特にプライバシに敏感なアプリケーションにおいて、LLMにおけるデータのプライバシ保護が不可欠になっている。メンバーシップ推論攻撃(MIA)は、モデルトレーニング/ファインチューニングデータセットに特定のデータサンプルが含まれているかどうかを判断し、重大なプライバシーリスクを生じさせる。しかし、LLMに対する既存のMIA技術のほとんどは、シーケンスレベルの集約予測統計に依存しており、メモリ化による予測改善と一般化による予測改善を区別できないため、攻撃効果は低い。この制限に対処するために,低信頼(ハード)トークンのトークンレベル確率をキャプチャする新しいメンバシップ推論手法を提案する。 HT-MIAは、微調整されたターゲットモデルと事前訓練された参照モデルとのハードトークンにおけるトークンレベルの確率改善を比較することにより、従来のMIAアプローチによって隠蔽される強靭で堅牢なメンバシップ信号を分離する。ドメイン固有の医療データセットと汎用ベンチマークの両方に関する大規模な実験により、HT-MIAは7つの最先端MIAベースラインを一貫して上回っていることが示された。さらに, LLMにおけるMIAに対する効果的な防御機構として, 差分プライベートトレーニングについて検討した。全体として、我々のHT-MIAフレームワークは、LCMの会員推論攻撃と防衛を進めるための最先端の基盤として、ハード・トケン・ベース・アナリティクスを確立している。

論文の概要: What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models

関連論文リスト