Fugu-MT 論文翻訳(概要): Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models

論文の概要: Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models

arxiv url: http://arxiv.org/abs/2512.05707v1
Date: Fri, 05 Dec 2025 13:34:05 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-13 22:40:57.037035
Title: Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models
Title（参考訳）: テキスト・画像モデルによる児童性虐待物質生成に対する概念フィルタリング対策の評価
Authors: Ana-Maria Cretu, Klim Kireev, Amro Abdalla, Wisdom Obinna, Raphael Meier, Sarah Adel Bargal, Elissa M. Redmiles, Carmela Troncoso,
Abstract要約: 児童性虐待物質(CSAM)を創出するテキスト・ツー・イメージ(T2I)モデルの誤用を防止するための児童フィルタリングの有効性を評価する。まず、ゲームベースのセキュリティ定義を用いて、CSAM生成の防止の複雑さを捉える。第二に、現在の検出手法では、データセットからすべての子供を取り除くことはできないことを示す。
参考スコア（独自算出の注目度）: 22.804759834225468
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We evaluate the effectiveness of child filtering to prevent the misuse of text-to-image (T2I) models to create child sexual abuse material (CSAM). First, we capture the complexity of preventing CSAM generation using a game-based security definition. Second, we show that current detection methods cannot remove all children from a dataset. Third, using an ethical proxy for CSAM (a child wearing glasses, hereafter, CWG), we show that even when only a small percentage of child images are left in the training dataset, there exist prompting strategies that generate CWG from a child-filtered T2I model using only a few more queries than when the model is trained on the unfiltered data. Fine-tuning the filtered model on child images further reduces the additional query overhead. We also show that reintroducing a concept is possible via fine-tuning even if filtering is perfect. Our results demonstrate that current filtering methods offer limited protection to closed-weight models and no protection to open-weight models, while reducing the generality of the model by hindering the generation of child-related concepts or changing their representation. We conclude by outlining challenges in conducting evaluations that establish robust evidence on the impact of AI safety mitigations for CSAM.
Abstract（参考訳）: 児童性虐待材料 (CSAM) の作成を目的としたテキスト・ツー・イメージ(T2I)モデルの誤用を防止するため, 児童フィルタリングの有効性を評価した。まず、ゲームベースのセキュリティ定義を用いて、CSAM生成の防止の複雑さを捉える。第二に、現在の検出手法では、データセットからすべての子供を取り除くことはできないことを示す。第3に、CSAMの倫理的プロキシ(以下、眼鏡をかけている子供、以下CWG)を用いて、トレーニングデータセットにわずかな児童画像しか残っていない場合でも、児童フィルタT2IモデルからCWGを生成する戦略が、モデルが未フィルタリングデータでトレーニングされた時よりもほんの数クエリだけ多く存在することを示す。子画像のフィルタリングモデルを微調整することで、追加のクエリオーバヘッドが削減される。また,フィルタが完璧であっても,微調整によって概念の再導入が可能であることを示す。この結果から,従来のフィルタリング手法では閉重モデルの保護は限定的であり,開放重モデルの保護は行わないが,児童関連概念の生成や表現の変化を阻害することにより,モデルの汎用性を低下させることが示されている。我々は,CSAMにおけるAI安全対策の効果に関する確固たる証拠を確証する評価を行う上での課題を概説して結論付けた。

論文の概要: Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models

関連論文リスト