Fugu-MT 論文翻訳(概要): DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

論文の概要: DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

arxiv url: http://arxiv.org/abs/2603.23925v1
Date: Wed, 25 Mar 2026 04:30:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 21:06:11.128143
Title: DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models
Title（参考訳）: DP^2-VL:ビジョンランゲージモデルのためのデータポジショニングによるプライベートフォトデータセット保護
Authors: Hongyi Miao, Jun Jia, Xincheng Wang, Qianli Ma, Wei Sun, Wangqiu Zhou, Dandan Zhu, Yewen Cao, Zhi Liu, Guangtao Zhai,
Abstract要約: アイデンティティ・アフィリエイト学習というプライバシ・脅威モデルを提案する。攻撃者は、ターゲット個人の数枚のプライベート写真を使用して、VLMを微調整する。このモデルにより、写真入力時に対象ユーザの個人情報を不正に露呈することができる。このプライバシーリスクを軽減するため、プライベート写真のための最初のデータセット保護フレームワークDP2-VLを提案する。
参考スコア（独自算出の注目度）: 47.98028812152569
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in visual-language alignment have endowed vision-language models (VLMs) with fine-grained image understanding capabilities. However, this progress also introduces new privacy risks. This paper first proposes a novel privacy threat model named identity-affiliation learning: an attacker fine-tunes a VLM using only a few private photos of a target individual, thereby embedding associations between the target facial identity and their private property and social relationships into the model's internal representations. Once deployed via public APIs, this model enables unauthorized exposure of the target user's private information upon input of their photos. To benchmark VLMs' susceptibility to such identity-affiliation leakage, we introduce the first identity-affiliation dataset comprising seven typical scenarios appearing in private photos. Each scenario is instantiated with multiple identity-centered photo-description pairs. Experimental results demonstrate that mainstream VLMs like LLaVA, Qwen-VL, and MiniGPT-v2, can recognize facial identities and infer identity-affiliation relationships by fine-tuning on small-scale private photographic dataset, and even on synthetically generated datasets. To mitigate this privacy risk, we propose DP2-VL, the first Dataset Protection framework for private photos that leverages Data Poisoning. Though optimizing imperceptible perturbations by pushing the original representations toward an antithetical region, DP2-VL induces a dataset-level shift in the embedding space of VLMs'encoders. This shift separates protected images from clean inference images, causing fine-tuning on the protected set to overfit. Extensive experiments demonstrate that DP2-VL achieves strong generalization across models, robustness to diverse post-processing operations, and consistent effectiveness across varying protection ratios.
Abstract（参考訳）: 視覚言語アライメントの最近の進歩は、微細な画像理解機能を備えた視覚言語モデル(VLM)が提案されている。しかし、この進歩は新たなプライバシーリスクももたらします。本稿では、まず、ターゲット人物のプライベートな写真のみを用いてVLMを微調整し、ターゲットの顔のアイデンティティとその個人的財産と社会的関係をモデルの内部表現に埋め込むという、新たなプライバシ・アフィリエイト・ラーニング(ID-アフィリエイト・ラーニング)を提案する。パブリックAPI経由でデプロイされると、このモデルは、写真を入力すると、ターゲットユーザのプライベート情報の不正な露出を可能にする。個人写真に現れる7つの典型的なシナリオからなる最初のアイデンティティ・アフィリエイトデータセットについて,VLMのアイデンティティ・アフィリエイトリークに対する感受性のベンチマークを行う。各シナリオは、複数のアイデンティティ中心の写真記述ペアでインスタンス化される。実験の結果,LLaVA,Qwen-VL,MiniGPT-v2といった主流のVLMは,小規模なプライベート写真データセットや合成データセット上での微調整により,顔の同一性を認識し,識別・親和関係を推定できることがわかった。このプライバシーリスクを軽減するために,我々はDP2-VLを提案する。 VLMのエンコーダの埋め込み空間において、DP2-VLは、元の表現をアンチテティカル領域にプッシュすることで、知覚不能な摂動を最適化するが、データセットレベルのシフトを誘導する。このシフトは、保護されたイメージをクリーンな推測画像から分離し、保護されたセットの微調整を過度に行う。拡張実験により、DP2-VLはモデル間での強い一般化、多種多様な後処理操作に対する堅牢性、および様々な保護比における一貫した有効性を実現することが示された。

論文の概要: DP^2-VL: Private Photo Dataset Protection by Data Poisoning for Vision-Language Models

関連論文リスト