Fugu-MT 論文翻訳(概要): PRIVET: Privacy Metric Based on Extreme Value Theory

論文の概要: PRIVET: Privacy Metric Based on Extreme Value Theory

arxiv url: http://arxiv.org/abs/2510.24233v1
Date: Tue, 28 Oct 2025 09:42:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:36.995873
Title: PRIVET: Privacy Metric Based on Extreme Value Theory
Title（参考訳）: PRIVET:極値理論に基づくプライバシメトリクス
Authors: Antoine Szatkownik, Aurélien Decelle, Beatriz Seoane, Nicolas Bereux, Léo Planche, Guillaume Charpiat, Burak Yelmen, Flora Jay, Cyril Furtlehner,
Abstract要約: 深層生成モデルは、しばしば、遺伝配列、健康データ、より広範に、著作権、ライセンス、保護されたコンテンツなどの機密データに基づいて訓練される。これにより、プライバシー保護のための合成データ、より具体的にはプライバシー漏洩に関する重要な懸念が持ち上がる。本稿では,個別のプライバシリークスコアを合成サンプルに割り当てる,汎用的なサンプルベースモダリティ非依存アルゴリズムPRIVETを提案する。
参考スコア（独自算出の注目度）: 8.447463478355845
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep generative models are often trained on sensitive data, such as genetic sequences, health data, or more broadly, any copyrighted, licensed or protected content. This raises critical concerns around privacy-preserving synthetic data, and more specifically around privacy leakage, an issue closely tied to overfitting. Existing methods almost exclusively rely on global criteria to estimate the risk of privacy failure associated to a model, offering only quantitative non interpretable insights. The absence of rigorous evaluation methods for data privacy at the sample-level may hinder the practical deployment of synthetic data in real-world applications. Using extreme value statistics on nearest-neighbor distances, we propose PRIVET, a generic sample-based, modality-agnostic algorithm that assigns an individual privacy leak score to each synthetic sample. We empirically demonstrate that PRIVET reliably detects instances of memorization and privacy leakage across diverse data modalities, including settings with very high dimensionality, limited sample sizes such as genetic data and even under underfitting regimes. We compare our method to existing approaches under controlled settings and show its advantage in providing both dataset level and sample level assessments through qualitative and quantitative outputs. Additionally, our analysis reveals limitations in existing computer vision embeddings to yield perceptually meaningful distances when identifying near-duplicate samples.
Abstract（参考訳）: 深層生成モデルは、しばしば、遺伝配列、健康データ、より広範に、著作権、ライセンス、保護されたコンテンツなどの機密データに基づいて訓練される。これは、プライバシー保護のための合成データ、特にプライバシー漏洩に関する重要な懸念を提起する。既存の方法は、モデルに関連するプライバシ障害のリスクを推定するために、ほとんどグローバルな基準にのみ依存しており、定量的な非解釈可能な洞察を提供するのみである。サンプルレベルでのデータプライバシに関する厳密な評価方法がないことは、実世界のアプリケーションにおける合成データの実践的な展開を妨げる可能性がある。近近距離の極値統計値を用いて,個々のプライバシリークスコアを合成サンプルに割り当てる汎用的なサンプルベースモダリティ非依存アルゴリズムPRIVETを提案する。我々は、PRIVETが、非常に高次元な設定、遺伝的データのような限られたサンプルサイズ、さらには不適合な体制下で、様々なデータモダリティにわたる記憶とプライバシー漏洩のインスタンスを確実に検出できることを実証的に実証した。本手法を制御された環境下での既存手法と比較し,定性的および定量的なアウトプットによるデータセットレベルとサンプルレベルのアセスメントの両面での優位性を示す。さらに,既存のコンピュータビジョンの埋め込みにおいて,近距離サンプルを識別する際,知覚的に意味のある距離を得られる限界を明らかにした。

論文の概要: PRIVET: Privacy Metric Based on Extreme Value Theory

関連論文リスト