Fugu-MT 論文翻訳(概要): ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection

論文の概要: ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection

arxiv url: http://arxiv.org/abs/2310.20208v2
Date: Wed, 29 Nov 2023 08:33:30 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-01 03:15:19.035352
Title: ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection
Title（参考訳）: ZoomNeXt:カモフラージュ物体検出のための統一協調ピラミッドネットワーク
Authors: Youwei Pang, Xiaoqi Zhao, Tian-Zhu Xiang, Lihe Zhang, Huchuan Lu
Abstract要約: 本稿では,不明瞭な画像や映像を観察する際の人間の行動を模倣する,効果的な統合型ピラミッドネットワークを提案する。具体的には、差別的な混合スケールのセマンティクスを学習するために、ズーム戦略を用いる。我々のタスクフレンドリーなフレームワークは、画像とビデオのCODベンチマークにおいて、既存の最先端の手法よりも一貫して優れています。
参考スコア（独自算出の注目度）: 75.22007160699948
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent camouflaged object detection (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios. Apart from the high intrinsic similarity between camouflaged objects and their background, objects are usually diverse in scale, fuzzy in appearance, and even severely occluded. To this end, we propose an effective unified collaborative pyramid network which mimics human behavior when observing vague images and videos, \textit{i.e.}, zooming in and out. Specifically, our approach employs the zooming strategy to learn discriminative mixed-scale semantics by the multi-head scale integration and rich granularity perception units, which are designed to fully explore imperceptible clues between candidate objects and background surroundings. The former's intrinsic multi-head aggregation provides more diverse visual patterns. The latter's routing mechanism can effectively propagate inter-frame difference in spatiotemporal scenarios and adaptively ignore static representations. They provides a solid foundation for realizing a unified architecture for static and dynamic COD. Moreover, considering the uncertainty and ambiguity derived from indistinguishable textures, we construct a simple yet effective regularization, uncertainty awareness loss, to encourage predictions with higher confidence in candidate regions. Our highly task-friendly framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks. The code will be available at \url{https://github.com/lartpang/ZoomNeXt}.
Abstract（参考訳）: 最近のcamouflaged object detection (COD)は、現実世界のシナリオでは極めて複雑で困難である、視覚的にブレンドされた物体を周囲に分割しようとする試みである。カモフラージュされた物体とそれらの背景の間の本質的な類似性は別として、物体は通常、スケールが多様であり、外観がファジィで、さらに密閉されている。そこで本研究では,曖昧な画像や映像を観察する際に人間の行動を模倣し,ズームインとズームアウトを行う,効果的な協調ピラミッドネットワークを提案する。具体的には,マルチヘッドスケール統合による識別的混合スケールセマンティクスを学習するためのズーム戦略と,候補対象と背景環境との不可避な手がかりを十分に探究するために設計されたリッチな粒度知覚単位を用いる。前者の本質的なマルチヘッドアグリゲーションは、より多様な視覚パターンを提供する。後者のルーティング機構は、時空間シナリオにおけるフレーム間差異を効果的に伝播し、静的表現を適応的に無視することができる。静的および動的codのための統一アーキテクチャを実現するための強固な基盤を提供する。さらに,不明瞭なテクスチャから生じる不確実性とあいまいさを考慮し,候補領域に高い信頼を抱く予測を促進するため,単純で効果的な正規化,不確実性認識損失を構築した。当社のタスクフレンドリーなフレームワークは、画像およびビデオcodベンチマークにおいて、既存の最先端のメソッドを一貫して上回っています。コードは \url{https://github.com/lartpang/ZoomNeXt} で入手できる。

論文の概要: ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection

関連論文リスト