Fugu-MT 論文翻訳(概要): Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis

論文の概要: Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis

arxiv url: http://arxiv.org/abs/2509.18014v1
Date: Mon, 22 Sep 2025 16:53:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-23 18:58:16.51806
Title: Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis
Title（参考訳）: Synth-MIA: タブラルデータ合成におけるプライバシー漏洩監査用テストベッド
Authors: Joshua Ward, Xiaofeng Lin, Chi-Hua Wang, Guang Cheng,
Abstract要約: タブラル生成モデルは、トレーニングデータに似た合成データセットを作成することによって、プライバシを保護するとしばしば主張される。会員推論攻撃(MIA)は、最近、合成データのプライバシー漏洩を評価する方法として登場した。合成データセットの最大プライバシー漏洩を推定するために、攻撃の集合をデプロイする統一的モデルに依存しない脅威フレームワークを提案する。
参考スコア（独自算出の注目度）: 8.4361320391543
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Tabular Generative Models are often argued to preserve privacy by creating synthetic datasets that resemble training data. However, auditing their empirical privacy remains challenging, as commonly used similarity metrics fail to effectively characterize privacy risk. Membership Inference Attacks (MIAs) have recently emerged as a method for evaluating privacy leakage in synthetic data, but their practical effectiveness is limited. Numerous attacks exist across different threat models, each with distinct implementations targeting various sources of privacy leakage, making them difficult to apply consistently. Moreover, no single attack consistently outperforms the others, leading to a routine underestimation of privacy risk. To address these issues, we propose a unified, model-agnostic threat framework that deploys a collection of attacks to estimate the maximum empirical privacy leakage in synthetic datasets. We introduce Synth-MIA, an open-source Python library that streamlines this auditing process through a novel testbed that integrates seamlessly into existing synthetic data evaluation pipelines through a Scikit-Learn-like API. Our software implements 13 attack methods through a Scikit-Learn-like API, designed to enable fast systematic estimation of privacy leakage for practitioners as well as facilitate the development of new attacks and experiments for researchers. We demonstrate our framework's utility in the largest tabular synthesis privacy benchmark to date, revealing that higher synthetic data quality corresponds to greater privacy leakage, that similarity-based privacy metrics show weak correlation with MIA results, and that the differentially private generator PATEGAN can fail to preserve privacy under such attacks. This underscores the necessity of MIA-based auditing when designing and deploying Tabular Generative Models.
Abstract（参考訳）: タブラル生成モデルは、トレーニングデータに似た合成データセットを作成することによって、プライバシを保護するとしばしば主張される。しかし、一般的に使われている類似度指標は、プライバシーリスクを効果的に特徴づけることができないため、実証的なプライバシの監査は依然として困難である。会員推論攻撃(MIA)は、最近、合成データのプライバシー漏洩を評価する方法として登場したが、その実用性は限られている。さまざまな脅威モデルにまたがる数多くの攻撃があり、それぞれがさまざまなプライバシー漏洩ソースをターゲットにした異なる実装を持つため、一貫した適用が困難である。さらに、単一の攻撃は、他の攻撃よりも一貫して優れておらず、常にプライバシーリスクを過小評価している。これらの問題に対処するため、我々は、合成データセットの最大プライバシー漏洩を推定するために、攻撃の集合をデプロイする統一されたモデルに依存しない脅威フレームワークを提案する。我々は、Synth-MIAというオープンソースのPythonライブラリを紹介します。Synth-MIAは、Scikit-LearnのようなAPIを通じて既存の合成データ評価パイプラインにシームレスに統合する新しいテストベッドを通じて、監査プロセスを合理化します。我々のソフトウェアは、Scikit-Learn-like APIを通じて13の攻撃方法を実装しており、実践者のプライバシー漏洩の迅速なシステマティックな評価を可能にするとともに、研究者の新たな攻撃や実験の促進を目的としている。我々は、我々のフレームワークの実用性を、これまでで最大の表形式の合成プライバシーベンチマークで実証し、高い合成データ品質がプライバシー漏洩の増大に対応すること、類似性に基づくプライバシメトリクスがMIA結果と弱い相関を示すこと、そして、微分プライベートジェネレータであるPATEGANが、そのような攻撃下でプライバシを保存することができないことを明らかにした。これは、タブラル生成モデルの設計およびデプロイにおけるMIAベースの監査の必要性を浮き彫りにする。

論文の概要: Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis

関連論文リスト