Fugu-MT 論文翻訳(概要): DPDSyn: Improving Differentially Private Dataset Synthesis for Model Training by Downstream Task Guidance

論文の概要: DPDSyn: Improving Differentially Private Dataset Synthesis for Model Training by Downstream Task Guidance

arxiv url: http://arxiv.org/abs/2604.15660v1
Date: Fri, 17 Apr 2026 03:27:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-20 22:00:19.721432
Title: DPDSyn: Improving Differentially Private Dataset Synthesis for Model Training by Downstream Task Guidance
Title（参考訳）: DPDSyn:下流タスクガイダンスによるモデルトレーニングのための微分プライベートデータセット合成の改善
Authors: Mingxuan Jia, Wen Huang, Weixin Zhao, Xingyi Wang, Jian Peng, Zhishuo Zhang,
Abstract要約: 我々は、元のプライベートデータセット上の下流タスクのための微分プライベートAIモデルをトレーニングし、トレーニングされたモデルを使用してデータセットを合成する。提案するDPDSynは, 精度が2.40倍, 合成効率が333.73倍の8つの最先端ベースラインを一貫して上回っている。
参考スコア（独自算出の注目度）: 6.939613890822898
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How to synthesize a dataset while achieving differential privacy for AI model training is a meaningful but challenging problem. To address this problem, state-of-the-art methods first select original private dataset's multiple low-dimensional distributions that have the potential to approximate the distribution of original private dataset with high precision, and then synthesize a dataset obeying all selected low-dimensional distributions as the synthetic dataset. However, it is difficult to select suitable low-dimensional distributions, which in turn degrades the data utility of resulting synthetic dataset. To improve differentially private dataset synthesis, we propose to train a differentially private AI model for downstream tasks on the original private dataset and utilize the trained model to synthesize datasets. In particular, on the one hand, the AI model satisfies differential privacy so no matter how to use the model does not disclose private information of original private dataset. On the other hand, the AI model is trained to complete the downstream task so the AI model preserves critical information for completing downstream tasks. We utilize the AI model to synthesize datasets to achieve the goal of improving data utility while preserving privacy. Empirical evaluations on four benchmark datasets demonstrate that our proposed DPDSyn consistently outperforms eight state-of-the-art baselines with a maximum improvement of 2.40x in accuracy and 333.73x in synthesis efficiency. Further experiments also validate that DPDSyn has strong scalability across varying data scales.
Abstract（参考訳）: AIモデルのトレーニングにおいて、差分プライバシーを達成しながらデータセットをどうやって合成するかは、有意義だが難しい問題である。この問題を解決するために、最先端の手法はまず、元のプライベートデータセットの分布を高精度に近似する可能性を持つ、元のプライベートデータセットの複数の低次元分布を選択し、次に、選択された低次元分布を合成データセットとして従うデータセットを合成する。しかし、適切な低次元分布を選択することは困難であり、結果として合成データセットのデータの有用性が低下する。差分プライベートなデータセット合成を改善するために,従来のプライベートデータセット上での下流タスクのための差分プライベートなAIモデルをトレーニングし,トレーニングされたモデルを用いてデータセットを合成する手法を提案する。特に、AIモデルは差分プライバシを満たすため、モデルの使用方法に関わらず、オリジナルのプライベートデータセットのプライベート情報を開示しない。一方、AIモデルは、下流タスクを完了させるために訓練され、下流タスクを完了するための重要な情報をAIモデルが保持する。 AIモデルを使用してデータセットを合成し、プライバシを保ちながらデータユーティリティを改善するという目標を達成する。 4つのベンチマークデータセットの実証評価により,提案するDPDSynの精度は2.40倍,合成効率は333.73倍に向上した。さらに実験では、DPDSynが様々なデータスケールにわたって強力なスケーラビリティを持っていることも確認されている。

論文の概要: DPDSyn: Improving Differentially Private Dataset Synthesis for Model Training by Downstream Task Guidance

関連論文リスト