Fugu-MT 論文翻訳(概要): ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection

論文の概要: ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection

arxiv url: http://arxiv.org/abs/2310.20208v4
Date: Sun, 14 Jul 2024 09:02:22 GMT
ステータス: 翻訳完了
システム内更新日: 2024-07-17 02:54:11.580029
Title: ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection
Title（参考訳）: ZoomNeXt:カモフラージュ物体検出のための統一協調ピラミッドネットワーク
Authors: Youwei Pang, Xiaoqi Zhao, Tian-Zhu Xiang, Lihe Zhang, Huchuan Lu,
Abstract要約: 最近のオブジェクト(COD)は、現実のシナリオでは極めて複雑で難しい、視覚的にブレンドされたオブジェクトを周囲に分割しようと試みている。本研究では,不明瞭な画像を観察したり,ズームインしたりアウトしたりする際の人間の行動を模倣する,効果的な統合協調ピラミッドネットワークを提案する。我々のフレームワークは、画像とビデオのCODベンチマークにおいて、既存の最先端の手法を一貫して上回っている。
参考スコア（独自算出の注目度）: 70.11264880907652
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent camouflaged object detection (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios. Apart from the high intrinsic similarity between camouflaged objects and their background, objects are usually diverse in scale, fuzzy in appearance, and even severely occluded. To this end, we propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and videos, \ie zooming in and out. Specifically, our approach employs the zooming strategy to learn discriminative mixed-scale semantics by the multi-head scale integration and rich granularity perception units, which are designed to fully explore imperceptible clues between candidate objects and background surroundings. The former's intrinsic multi-head aggregation provides more diverse visual patterns. The latter's routing mechanism can effectively propagate inter-frame differences in spatiotemporal scenarios and be adaptively deactivated and output all-zero results for static representations. They provide a solid foundation for realizing a unified architecture for static and dynamic COD. Moreover, considering the uncertainty and ambiguity derived from indistinguishable textures, we construct a simple yet effective regularization, uncertainty awareness loss, to encourage predictions with higher confidence in candidate regions. Our highly task-friendly framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks. Our code can be found at {https://github.com/lartpang/ZoomNeXt}.
Abstract（参考訳）: 最近のcamouflaged object detection (COD)は、現実世界のシナリオでは極めて複雑で困難である、視覚的にブレンドされた物体を周囲に分割しようとする試みである。カモフラージュされた物体とそれらの背景の間の本質的な類似性は別として、物体は通常、スケールが多様であり、外観がファジィで、さらに密閉されている。そこで本研究では,不明瞭な画像やビデオのズームインやズームアウトを行う際の人間の行動を模倣する,効果的な統合型ピラミッドネットワークを提案する。具体的には,マルチヘッドスケール統合とリッチな粒度認識ユニットによる識別的混合スケール意味論の学習に,ズーム方式を用いている。前者の本質的なマルチヘッドアグリゲーションは、より多様な視覚パターンを提供する。後者のルーティング機構は、時空間シナリオにおけるフレーム間差異を効果的に伝播し、静的表現のために適応的に非活性化し、全ゼロ結果を出力する。静的および動的CODのための統一アーキテクチャを実現するための強固な基盤を提供する。さらに,不明瞭なテクスチャから生じる不確実性とあいまいさを考慮し,候補領域に高い信頼を抱く予測を促進するため,単純で効果的な正規化,不確実性認識損失を構築した。我々のタスクフレンドリーなフレームワークは、画像とビデオのCODベンチマークにおいて、既存の最先端の手法よりも一貫して優れています。私たちのコードは、https://github.com/lartpang/ZoomNeXt}で参照できます。

論文の概要: ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection

関連論文リスト