Fugu-MT 論文翻訳(概要): CoLA: A Choice Leakage Attack Framework to Expose Privacy Risks in Subset Training

論文の概要: CoLA: A Choice Leakage Attack Framework to Expose Privacy Risks in Subset Training

arxiv url: http://arxiv.org/abs/2604.12342v1
Date: Tue, 14 Apr 2026 06:26:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-15 19:11:32.283134
Title: CoLA: A Choice Leakage Attack Framework to Expose Privacy Risks in Subset Training
Title（参考訳）: CoLA: サブセットトレーニングにおけるプライバシリスクの排除を目的とした漏洩攻撃フレームワーク
Authors: Qi Li, Cheng-Long Wang, Yinzhi Cao, Di Wang,
Abstract要約: 完全なデータセットではなく、慎重に選択されたデータ上のトレーニングモデルは、現代MLの標準的な前処理になっている。サブセットトレーニングはプライバシフリーではないことを示します。どのデータが含まれているか、除外されているかという選択は、新たなプライバシサーフェスを導入します。プライバシリークをサブセット選択で解析する統合フレームワークであるCoLAを提案する。
参考スコア（独自算出の注目度）: 40.28755876624292
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training models on a carefully chosen portion of data rather than the full dataset is now a standard preprocess for modern ML. From vision coreset selection to large-scale filtering in language models, it enables scalability with minimal utility loss. A common intuition is that training on fewer samples should also reduce privacy risks. In this paper, we challenge this assumption. We show that subset training is not privacy free: the very choices of which data are included or excluded can introduce new privacy surface and leak more sensitive information. Such information can be captured by adversaries either through side-channel metadata from the subset selection process or via the outputs of the target model. To systematically study this phenomenon, we propose CoLA (Choice Leakage Attack), a unified framework for analyzing privacy leakage in subset selection. In CoLA, depending on the adversary's knowledge of the side-channel information, we define two practical attack scenarios: Subset-aware Side-channel Attacks and Black-box Attacks. Under both scenarios, we investigate two privacy surfaces unique to subset training: (1) Training-membership MIA (TM-MIA), which concerns only the privacy of training data membership, and (2) Selection-participation MIA (SP-MIA), which concerns the privacy of all samples that participated in the subset selection process. Notably, SP-MIA enlarges the notion of membership from model training to the entire data-model supply chain. Experiments on vision and language models show that existing threat models underestimate subset-training privacy risks: the expanded privacy surface leaks both training and selection membership, extending risks from individual models to the broader ML ecosystem.
Abstract（参考訳）: 完全なデータセットではなく、慎重に選択されたデータ上のトレーニングモデルは、現代MLの標準的な前処理になっている。ビジョンコアセットの選択から言語モデルにおける大規模フィルタリングに至るまで、最小限のユーティリティ損失でスケーラビリティを実現する。一般的な直観では、サンプルの少ないトレーニングはプライバシーのリスクを軽減できる。本稿では,この仮定に挑戦する。サブセットトレーニングはプライバシフリーではないことを示します。どのデータが含まれているか、除外されているかという選択は、新たなプライバシサーフェスを導入し、より機密性の高い情報を漏洩させます。このような情報は、サブセット選択プロセスからのサイドチャネルメタデータまたはターゲットモデルの出力を介して、敵によってキャプチャすることができる。この現象を体系的に研究するために,サブセット選択におけるプライバシー漏洩を解析するための統合フレームワークであるCoLA(Choice Leakage Attack)を提案する。 CoLAでは、サイドチャネル情報に対する敵の知識に基づいて、サブセット対応のサイドチャネルアタックとブラックボックスアタックの2つの実用的な攻撃シナリオを定義している。いずれのシナリオにおいても,(1)トレーニング会員シップMIA(TM-MIA)と(2)サブセット選択プロセスに参加したすべてのサンプルのプライバシに関する選択参加MIA(SP-MIA)の2つのプライバシ面を調査する。特にSP-MIAは、モデルトレーニングからデータモデルサプライチェーン全体へのメンバシップの概念を拡大します。既存の脅威モデルは、トレーニングと選択の両方のメンバシップをリークし、個々のモデルからより広範なMLエコシステムへのリスクを拡大する。

論文の概要: CoLA: A Choice Leakage Attack Framework to Expose Privacy Risks in Subset Training

関連論文リスト