Fugu-MT 論文翻訳(概要): SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening

論文の概要: SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening

arxiv url: http://arxiv.org/abs/2605.17610v1
Date: Sun, 17 May 2026 19:10:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:48.228492
Title: SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening
Title（参考訳）: SafeLens: 高速かつスローなスクリーニング機能を備えたビデオガードレール
Authors: Shahriar Kabir Nahin, Hadi Askari, Muhao Chen, Anshuman Chhabra,
Abstract要約: 本稿では,高速かつ高精度なコンテンツモデレーションのための高速かつスローな推論アーキテクチャであるSafeLensを提案する。 SafeWatchデータセットにインフルエンス誘導フィルタリングを適用して高品質なデータセットを構築し、元のデータの2.4%しか保持しない。 SafeLensは、現実世界とAIが生成したビデオベンチマーク全体で、最先端のパフォーマンスを達成し、強力なオープンソースのビデオガードレールを上回っている。
参考スコア（独自算出の注目度）: 29.597821689288963
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid growth of online video platforms and AI-generated content has made reliable video guardrails a key challenge for safety and real-world deployment. While most videos can be screened through fast pattern recognition, a small subset requires deeper reasoning over temporally complex content and nuanced policy constraints. Existing approaches typically rely on large vision-language models applied uniformly across all inputs, resulting in high inference costs and inefficient allocation of computation. We propose SafeLens, a video guardrail framework that introduces a fast-and-slow inference architecture for efficient and accurate content moderation with variable computational cost across inputs. Additionally, we construct a high-quality dataset by applying influence-guided filtering to the SafeWatch Dataset, retaining only 2.4% of the original data. To further address limitations of training-time scaling, we enable test-time reasoning by augmenting the filtered data with structured Chain-of-Thought traces. Across real-world and AI-generated video benchmarks, SafeLens achieves state-of-the-art performance, outperforming strong open-source video guardrails (e.g., SafeWatch-8B, OmniGuard-7B) and closed-source models (e.g., GPT-5.4, Gemini-3.1-pro) while significantly reducing inference cost, demonstrating that efficient design serves to be more effective than scaling data or model size alone.
Abstract（参考訳）: オンラインビデオプラットフォームとAI生成コンテンツの急速な成長により、信頼性の高いビデオガードレールは、安全性と現実のデプロイメントにおいて重要な課題となっている。ほとんどのビデオは高速なパターン認識によってスクリーニングできますが、小さなサブセットでは、時間的に複雑なコンテンツや、曖昧なポリシー制約よりも深い推論が必要です。既存のアプローチは通常、全ての入力に対して一様に適用された大きな視覚言語モデルに依存し、高い推論コストと計算の非効率な割り当てをもたらす。本稿では,高速かつスローな推論アーキテクチャを導入し,高速かつ高精度なコンテンツモデレーションを実現するためのビデオガードレールフレームワークであるSafeLensを提案する。さらに、SafeWatchデータセットにインフルエンサー誘導フィルタリングを適用し、元のデータの2.4%しか保持しない高品質なデータセットを構築した。トレーニング時間スケーリングの限界にさらに対処するために、構造化されたChain-of-Thoughtトレースでフィルタリングデータを増強することで、テスト時間推論を可能にする。 SafeLensは、実世界およびAI生成のビデオベンチマーク全体で、最先端のパフォーマンスを達成し、強力なオープンソースビデオガードレール(例:SafeWatch-8B、OmniGuard-7B)やクローズドソースモデル(例:GPT-5.4、Gemini-3.1-pro)よりも優れ、推論コストを大幅に削減し、効率的な設計がデータスケーリングやモデルサイズよりも効果的であることを示す。

論文の概要: SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening

関連論文リスト