Fugu-MT 論文翻訳(概要): TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models

論文の概要: TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models

arxiv url: http://arxiv.org/abs/2511.21145v1
Date: Wed, 26 Nov 2025 07:58:42 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-27 18:37:59.018871
Title: TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models
Title（参考訳）: TEAR: テキスト・ビデオ・モデルのためのタイムアウェア・オートマチック・リピート
Authors: Jiaming He, Guanyu Hou, Hongwei Li, Zhicong Huang, Kangjie Chen, Yi Yu, Wenbo Jiang, Guowen Xu, Tianwei Zhang,
Abstract要約: テキスト・トゥ・ビデオ(T2V)モデルは高品質で時間的に一貫性のあるダイナミックビデオコンテンツを合成することができる。静的画像とテキスト生成に焦点を当てた既存の安全性評価手法は、ビデオ生成における複雑な時間的ダイナミクスを捉えるには不十分である。本稿では,T2Vモデルの動的時間的シークエンシングと関係のある安全性リスクを明らかにするための自動フレームワークTEARを提案する。
参考スコア（独自算出の注目度）: 36.61440730824693
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-Video (T2V) models are capable of synthesizing high-quality, temporally coherent dynamic video content, but the diverse generation also inherently introduces critical safety challenges. Existing safety evaluation methods,which focus on static image and text generation, are insufficient to capture the complex temporal dynamics in video generation. To address this, we propose a TEmporal-aware Automated Red-teaming framework, named TEAR, an automated framework designed to uncover safety risks specifically linked to the dynamic temporal sequencing of T2V models. TEAR employs a temporal-aware test generator optimized via a two-stage approach: initial generator training and temporal-aware online preference learning, to craft textually innocuous prompts that exploit temporal dynamics to elicit policy-violating video output. And a refine model is adopted to improve the prompt stealthiness and adversarial effectiveness cyclically. Extensive experimental evaluation demonstrates the effectiveness of TEAR across open-source and commercial T2V systems with over 80% attack success rate, a significant boost from prior best result of 57%.
Abstract（参考訳）: テキスト・トゥ・ビデオ(T2V)モデルは高品質で時間的に一貫性のあるダイナミックビデオコンテンツを合成できるが、多種多様な世代は本質的に重要な安全上の課題をもたらす。静的画像とテキスト生成に焦点を当てた既存の安全性評価手法は、ビデオ生成における複雑な時間的ダイナミクスを捉えるには不十分である。そこで本研究では,T2Vモデルの動的時間的シークエンシングと関係のある安全性リスクを明らかにすることを目的として,TEPoral-aware Automated Red-teaming frameworkであるTEARを提案する。 TEARは、2段階のアプローチで最適化された時間認識テストジェネレータを採用している。最初のジェネレータトレーニングと時間認識オンライン嗜好学習は、時間的ダイナミクスを利用してポリシー違反のビデオ出力を誘発する、テキスト的に無害なプロンプトを作成する。また, 高速ステルスネスと対角効果を周期的に向上するために, 精巧なモデルを採用した。大規模な実験的評価は、オープンソースおよび商用T2Vシステムにおいて、80%以上の攻撃成功率を持つTEARの有効性を示す。

論文の概要: TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models

関連論文リスト