Fugu-MT 論文翻訳(概要): CertDW: Towards Certified Dataset Ownership Verification via Conformal Prediction

論文の概要: CertDW: Towards Certified Dataset Ownership Verification via Conformal Prediction

arxiv url: http://arxiv.org/abs/2506.13160v1
Date: Mon, 16 Jun 2025 07:17:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-17 17:28:47.59987
Title: CertDW: Towards Certified Dataset Ownership Verification via Conformal Prediction
Title（参考訳）: CertDW: コンフォーマル予測による認証データセットのオーナシップ検証を目指して
Authors: Ting Qiao, Yiming Li, Jianbin Li, Yingjia Wang, Leyi Qi, Junfeng Guo, Ruili Feng, Dacheng Tao,
Abstract要約: 本稿では,最初の認証データセット透かし(CertDW)とCertDWベースの認証データセットオーナシップ検証手法を提案する。共形予測に触発されて,主確率 (PP) と透かし頑健性 (WR) の2つの統計指標を導入する。我々は、不審モデルのWR値が、透かしのないデータセットでトレーニングされた良性モデルのPP値を大幅に上回る場合に、PPとWRの間に証明可能な低い境界が存在することを証明した。
参考スコア（独自算出の注目度）: 48.82467166657901
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Deep neural networks (DNNs) rely heavily on high-quality open-source datasets (e.g., ImageNet) for their success, making dataset ownership verification (DOV) crucial for protecting public dataset copyrights. In this paper, we find existing DOV methods (implicitly) assume that the verification process is faithful, where the suspicious model will directly verify ownership by using the verification samples as input and returning their results. However, this assumption may not necessarily hold in practice and their performance may degrade sharply when subjected to intentional or unintentional perturbations. To address this limitation, we propose the first certified dataset watermark (i.e., CertDW) and CertDW-based certified dataset ownership verification method that ensures reliable verification even under malicious attacks, under certain conditions (e.g., constrained pixel-level perturbation). Specifically, inspired by conformal prediction, we introduce two statistical measures, including principal probability (PP) and watermark robustness (WR), to assess model prediction stability on benign and watermarked samples under noise perturbations. We prove there exists a provable lower bound between PP and WR, enabling ownership verification when a suspicious model's WR value significantly exceeds the PP values of multiple benign models trained on watermark-free datasets. If the number of PP values smaller than WR exceeds a threshold, the suspicious model is regarded as having been trained on the protected dataset. Extensive experiments on benchmark datasets verify the effectiveness of our CertDW method and its resistance to potential adaptive attacks. Our codes are at \href{https://github.com/NcepuQiaoTing/CertDW}{GitHub}.
Abstract（参考訳）: ディープニューラルネットワーク(DNN)はその成功のために、高品質なオープンソースデータセット(イメージネットなど)に大きく依存しており、公開データセット著作権を保護するためにデータセットオーナシップ検証(DOV)が不可欠である。本稿では,既存のDOV手法で検証プロセスが忠実であると仮定し,検証サンプルを入力とし,結果を返すことによって,疑わしいモデルが直接オーナシップの検証を行う。しかし、この仮定は実際には必ずしも成立せず、意図的または意図しない摂動を受けると、その性能は急激に低下する可能性がある。この制限に対処するため、ある条件(例えば、制約されたピクセルレベルの摂動)の下で、悪意のある攻撃の下でも信頼性の高い検証を保証する、最初の認証データセット透かし(例えば、CertDW)とCertDWベースの認証データセットオーナシップ検証手法を提案する。具体的には,主確率 (PP) と透かし剛性 (WR) の2つの統計測度を導入し, 騒音摂動下での良性試料と透かし試料のモデル予測安定性を評価する。我々は、不審モデルのWR値が、透かしのないデータセットでトレーニングされた複数の良性モデルのPP値を大幅に上回る場合に、PPとWRの間に証明可能な低い境界が存在することを証明した。 WRよりも小さいPP値の数がしきい値を超えると、不審なモデルは保護されたデータセットで訓練されたと見なされる。ベンチマークデータセットの大規模な実験は、我々のCertDW法の有効性と、潜在的な適応攻撃に対する耐性を検証する。私たちのコードは \href{https://github.com/NcepuQiaoTing/CertDW}{GitHub} にあります。

論文の概要: CertDW: Towards Certified Dataset Ownership Verification via Conformal Prediction

関連論文リスト