Fugu-MT 論文翻訳(概要): iSeg: An Iterative Refinement-based Framework for Training-free Segmentation

論文の概要: iSeg: An Iterative Refinement-based Framework for Training-free Segmentation

arxiv url: http://arxiv.org/abs/2409.03209v2
Date: Fri, 6 Sep 2024 14:15:29 GMT
ステータス: 翻訳完了
システム内更新日: 2024-09-09 13:05:05.334760
Title: iSeg: An Iterative Refinement-based Framework for Training-free Segmentation
Title（参考訳）: iSeg: トレーニングフリーセグメンテーションのための反復リファインメントベースのフレームワーク
Authors: Lin Sun, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang,
Abstract要約: 安定拡散は、テキスト記述を与える強力な画像合成能力を示し、オブジェクトをグループ化するための強力な意味的手がかりを含むことを示唆している。既存のアプローチのほとんどは、単にクロスアテンションマップを使用するか、自己アテンションマップによってそれを洗練して、セグメンテーションマスクを生成する。エントロピーを再現した自己アテンションモジュールを持つiSegという,トレーニング不要セグメンテーションのための反復的洗練フレームワークを提案する。
参考スコア（独自算出の注目度）: 85.58324416386375
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stable diffusion has demonstrated strong image synthesis ability to given text descriptions, suggesting it to contain strong semantic clue for grouping objects. Inspired by this, researchers have explored employing stable diffusion for trainingfree segmentation. Most existing approaches either simply employ cross-attention map or refine it by self-attention map, to generate segmentation masks. We believe that iterative refinement with self-attention map would lead to better results. However, we mpirically demonstrate that such a refinement is sub-optimal likely due to the self-attention map containing irrelevant global information which hampers accurately refining cross-attention map with multiple iterations. To address this, we propose an iterative refinement framework for training-free segmentation, named iSeg, having an entropy-reduced self-attention module which utilizes a gradient descent scheme to reduce the entropy of self-attention map, thereby suppressing the weak responses corresponding to irrelevant global information. Leveraging the entropy-reduced self-attention module, our iSeg stably improves refined crossattention map with iterative refinement. Further, we design a category-enhanced cross-attention module to generate accurate cross-attention map, providing a better initial input for iterative refinement. Extensive experiments across different datasets and diverse segmentation tasks reveal the merits of proposed contributions, leading to promising performance on diverse segmentation tasks. For unsupervised semantic segmentation on Cityscapes, our iSeg achieves an absolute gain of 3.8% in terms of mIoU compared to the best existing training-free approach in literature. Moreover, our proposed iSeg can support segmentation with different kind of images and interactions.
Abstract（参考訳）: 安定拡散は、テキスト記述を与える強力な画像合成能力を示し、オブジェクトをグループ化するための強力な意味的手がかりを含むことを示唆している。これに触発された研究者らは、トレーニングフリーセグメンテーションに安定な拡散を利用する方法を模索してきた。既存のアプローチのほとんどは、単にクロスアテンションマップを使用するか、自己アテンションマップによってそれを洗練して、セグメンテーションマスクを生成する。私たちは、自己注意マップによる反復的な改善がより良い結果をもたらすと信じています。しかし、このような改善は、複数の反復で横断地図を正確に精錬する無関係なグローバル情報を含む自己注意マップが原因で、準最適である可能性が経験的に実証されている。そこで本研究では,非関係なグローバル情報に対応する弱応答を抑えるために,勾配勾配勾配法を用いて,非関連な自己アテンションマップのエントロピーを減少させるエントロピー還元型自己アテンションモジュールを備えた,iSegと呼ばれるトレーニングフリーセグメンテーションのための反復的改良フレームワークを提案する。エントロピーを再現した自己アテンションモジュールを活用することで、iSegは反復的洗練による洗練された相互アテンションマップを安定的に改善する。さらに,カテゴリ拡張型クロスアテンションモジュールを設計し,正確なクロスアテンションマップを生成する。さまざまなデータセットと多様なセグメンテーションタスクにわたる大規模な実験は、提案されたコントリビューションのメリットを明らかにし、多様なセグメンテーションタスクにおいて有望なパフォーマンスをもたらす。 Cityscapesの教師なしセマンティックセマンティックセマンティクスでは,mIoUの3.8%の絶対ゲインを達成している。さらに、提案したiSegは、異なる種類の画像とインタラクションによるセグメンテーションをサポートすることができる。

論文の概要: iSeg: An Iterative Refinement-based Framework for Training-free Segmentation

関連論文リスト