Fugu-MT 論文翻訳(概要): Impostor: An Agent-Curated Benchmark for Realistic AIGC Manipulation Localization

論文の概要: Impostor: An Agent-Curated Benchmark for Realistic AIGC Manipulation Localization

arxiv url: http://arxiv.org/abs/2606.04545v1
Date: Wed, 03 Jun 2026 07:27:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 20:44:18.604776
Title: Impostor: An Agent-Curated Benchmark for Realistic AIGC Manipulation Localization
Title（参考訳）: Impostor: AIGC操作ローカライゼーションのためのエージェントキュレートベンチマーク
Authors: Zhenliang Li, Yutao Hu, Qixiong Wang, Wenpeng Du, Hongxiang Jiang, Jiasong Wu, Xiaolong Jiang, Jungong Han,
Abstract要約: 高品質なAI編集画像操作ローカライゼーションデータセットであるImpostorを紹介した。 Impostorは、シーン認識、編集計画、操作実行、品質検証、反復リフレクションを統合するクローズドループエージェントフレームワークであるCraftAgentによって構築されている。本研究では,局所的な位相モデリングと意味論的整合性学習を導入し,意味論的に検証可能なが法学的に乱された領域をよりよくローカライズする意味情報ネットワーク(PANet)を提案する。
参考スコア（独自算出の注目度）: 45.52738316793005
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in generative image editing have improved the realism and controllability of localized image manipulation, raising new challenges for image manipulation detection and localization (IMDL). However, existing IMDL benchmarks still have limitations in visual realism, manipulation diversity, and generator coverage, making it difficult to reflect recent trends in image manipulation. To address these limitations, we introduce Impostor, a high-quality AI-edited image manipulation localization dataset containing 100K manipulated images. Impostor is constructed by CraftAgent, a closed-loop agent framework that integrates scene perception, editing planning, manipulation execution, quality validation, and iterative reflection to automatically generate diverse and visually realistic manipulated images. Moreover, Impostor contains images generated by seven recent AIGC models across three manipulation types and includes multiple manipulated regions, providing a more comprehensive benchmark for AIGC-based IMDL. Furthermore, we propose PhaseAware-Net (PANet), a semantic-forensic framework that introduces local phase modeling and semantic-forensic consistency learning to better localize semantically plausible yet forensically disrupted manipulated regions. Extensive experiments show that Impostor poses significant challenges to existing large vision-language models (LVLMs) and specialized IMDL methods, while PANet achieves superior performance on Impostor and multiple public benchmarks.
Abstract（参考訳）: 近年、画像編集の進歩により、局所的な画像操作の現実性と制御性が向上し、画像操作検出と局所化(IMDL)の新たな課題が提起されている。しかし、既存のIMDLベンチマークには、視覚リアリズム、操作の多様性、ジェネレータのカバレッジに制限があるため、最近の画像操作の傾向を反映することは困難である。これらの制限に対処するために、100Kの操作された画像を含む高品質なAI編集画像操作ローカライゼーションデータセットであるImpostorを紹介した。 Impostorは、シーン認識、編集計画、操作実行、品質検証、反復リフレクションを統合するクローズドループエージェントフレームワークであるCraftAgentによって構築され、多彩で視覚的に操作されたイメージを自動的に生成する。さらに、Impostorには、3つの操作タイプにわたる7つの最新のAIGCモデルによって生成されたイメージが含まれており、複数の操作されたリージョンが含まれており、AIGCベースのIMDLのより包括的なベンチマークを提供する。さらに、局所位相モデリングと意味法則整合学習を導入し、意味論的に検証可能なが、法学的に乱された操作領域のローカライズを向上する意味法学フレームワークであるPyseAware-Net(PANet)を提案する。大規模な実験により、Impostorは既存の大規模視覚言語モデル(LVLM)と特殊なIMDLメソッドに重大な課題を呈し、PANetはImpostorと複数の公開ベンチマークで優れたパフォーマンスを達成している。

論文の概要: Impostor: An Agent-Curated Benchmark for Realistic AIGC Manipulation Localization

関連論文リスト