Fugu-MT 論文翻訳(概要): CoCoVideo: The High-Quality Commercial-Model-Based Contrastive Benchmark for AI-Generated Video Detection

論文の概要: CoCoVideo: The High-Quality Commercial-Model-Based Contrastive Benchmark for AI-Generated Video Detection

arxiv url: http://arxiv.org/abs/2606.00101v1
Date: Tue, 26 May 2026 03:18:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:27.892137
Title: CoCoVideo: The High-Quality Commercial-Model-Based Contrastive Benchmark for AI-Generated Video Detection
Title（参考訳）: CoCoVideo:AI生成ビデオ検出のための高品質な商用モデルベースのコントラストベンチマーク
Authors: Huidong Feng, Wentao Chen, Jie Chen, Xinqi Cai, Ruolong Ma, Yinglin Zheng, Yuxin Lin, Ming Zeng,
Abstract要約: CoCoVideo-26Kは、13の主流の商用ジェネレータをカバーする、対照的な商用モデルベースのAIGCビデオデータセットである。このデータセットに基づいて,コントラスト学習とMLLM推論を統合したフレームワークであるCoDetectを提案する。
参考スコア（独自算出の注目度）: 12.128022166557754
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid advancement of artificial intelligence generated content (AIGC) technologies, video forgery has become increasingly prevalent, posing new challenges to public discourse and societal security. Despite remarkable progress in existing deepfake detection methods, AIGC forgery detection remains challenging, as existing datasets mainly rely on open-source video generation models with quality far below that of commercial AIGC systems. Even datasets containing a few commercial samples often retain visible watermarks, compromising authenticity and hindering model generalization to high-fidelity AIGC videos. To address these issues, we introduce CoCoVideo-26K, a contrastive, commercial-model-based AIGC video dataset covering 13 mainstream commercial generators and providing semantically aligned real-fake video pairs. This dataset enables deeper exploration of the differences between authentic and high-quality synthetic videos and establishes a new benchmark for highly realistic video forgery detection. Building on this dataset, we propose CoCoDetect, a detection framework integrating contrastive learning with confidence-gated multimodal large language model (MLLM) inference. An R3D-18 backbone extracts spatio-temporal representations, while a confidence gate routes uncertain cases to an MLLM for reasoning about physical plausibility and scene consistency. Extensive experiments on CoCoVideo-26K and public benchmarks demonstrate state-of-the-art performance, validating the framework's robustness and generalizability. Our code and dataset are available at https://github.com/DonoToT/CoCoVideo.
Abstract（参考訳）: 人工知能生成コンテンツ(AIGC)技術の急速な進歩により、ビデオ偽造はますます普及し、公衆の言論や社会保障に新たな課題を提起している。既存のディープフェイク検出手法の顕著な進歩にもかかわらず、AIGCの偽造検出は依然として困難であり、既存のデータセットは主に、商用AIGCシステムよりもはるかに品質の高いオープンソースのビデオ生成モデルに依存している。いくつかの商用サンプルを含むデータセットでさえ、しばしば可視な透かしを保持し、信頼性を妥協し、高忠実度AIGCビデオへのモデル一般化を妨げる。これらの問題に対処するために、13の主流の商用ジェネレータをカバーし、意味的に整合したリアルタイムビデオペアを提供する、対照的な商用モデルベースのAIGCビデオデータセットであるCoCoVideo-26Kを紹介した。このデータセットは、本物の合成ビデオと高品質な合成ビデオの違いをより深く調査し、高度にリアルなビデオ偽造検出のための新しいベンチマークを確立する。このデータセットをベースとしたCoCoDetectは,コントラスト学習と自信付きマルチモーダル大言語モデル(MLLM)を統合した検出フレームワークである。 R3D-18バックボーンは時空間表現を抽出し、信頼ゲートは不確実なケースをMLLMにルートし、物理的妥当性とシーン一貫性を推論する。 CoCoVideo-26Kと公開ベンチマークに関する大規模な実験は、最先端のパフォーマンスを示し、フレームワークの堅牢性と一般化性を検証する。コードとデータセットはhttps://github.com/DonoToT/CoCoVideo.comから入手可能です。

論文の概要: CoCoVideo: The High-Quality Commercial-Model-Based Contrastive Benchmark for AI-Generated Video Detection

関連論文リスト