Fugu-MT 論文翻訳(概要): ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation

論文の概要: ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation

arxiv url: http://arxiv.org/abs/2511.14259v1
Date: Tue, 18 Nov 2025 08:50:17 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-19 16:23:53.018593
Title: ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation
Title（参考訳）: ManipShield:イメージマニピュレーション検出、ローカライゼーション、説明のための統一フレームワーク
Authors: Zitong Xu, Huiyu Duan, Xiaoyu Wang, Zhaolin Cai, Kaiwei Zhang, Qiang Hu, Jing Liu, Xiongkuo Min, Guangtao Zhai,
Abstract要約: 画像操作検出と局所化のための大規模ベンチマークである textbfManipBench を提案する。また,マルチモーダル大言語モデル(MLLM)に基づくオールインワンモデルであるtextbfManipShieldを提案する。
参考スコア（独自算出の注目度）: 81.52606410224136
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid advancement of generative models, powerful image editing methods now enable diverse and highly realistic image manipulations that far surpass traditional deepfake techniques, posing new challenges for manipulation detection. Existing image manipulation detection and localization (IMDL) benchmarks suffer from limited content diversity, narrow generative-model coverage, and insufficient interpretability, which hinders the generalization and explanation capabilities of current manipulation detection methods. To address these limitations, we introduce \textbf{ManipBench}, a large-scale benchmark for image manipulation detection and localization focusing on AI-edited images. ManipBench contains over 450K manipulated images produced by 25 state-of-the-art image editing models across 12 manipulation categories, among which 100K images are further annotated with bounding boxes, judgment cues, and textual explanations to support interpretable detection. Building upon ManipBench, we propose \textbf{ManipShield}, an all-in-one model based on a Multimodal Large Language Model (MLLM) that leverages contrastive LoRA fine-tuning and task-specific decoders to achieve unified image manipulation detection, localization, and explanation. Extensive experiments on ManipBench and several public datasets demonstrate that ManipShield achieves state-of-the-art performance and exhibits strong generality to unseen manipulation models. Both ManipBench and ManipShield will be released upon publication.
Abstract（参考訳）: 生成モデルの急速な進歩により、強力な画像編集手法により、従来のディープフェイク技術をはるかに超越した、多種多様な、非常に現実的な画像操作が可能になった。既存の画像操作検出・ローカライゼーション(IMDL)ベンチマークは、コンテンツ多様性の制限、生成モデルの範囲の狭さ、解釈可能性の欠如に悩まされており、現在の操作検出手法の一般化と説明を妨げている。これらの制限に対処するために、AI編集画像に焦点をあてた画像検出とローカライゼーションのための大規模ベンチマークである \textbf{ManipBench} を導入する。 ManipBenchには、12の操作カテゴリにわたる25の最先端の画像編集モデルによって生成される450K以上の操作済みイメージが含まれており、そのうち100Kイメージには、解釈可能な検出をサポートするためのバウンディングボックス、判定キュー、テキスト説明が付加されている。 ManipBench をベースとしたマルチモーダル大言語モデル (MLLM) に基づくオールインワンモデルである \textbf{ManipShield} を提案する。 ManipBenchといくつかの公開データセットに関する大規模な実験は、ManipShieldが最先端のパフォーマンスを達成し、目に見えない操作モデルに対して強力な汎用性を示すことを示している。 ManipBench と ManipShield はいずれも出版時にリリースされる。

論文の概要: ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation

関連論文リスト