Fugu-MT 論文翻訳(概要): Through the Looking Glass: A Dual Perspective on Weakly-Supervised Few-Shot Segmentation

論文の概要: Through the Looking Glass: A Dual Perspective on Weakly-Supervised Few-Shot Segmentation

arxiv url: http://arxiv.org/abs/2508.16159v1
Date: Fri, 22 Aug 2025 07:29:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-25 16:42:36.292901
Title: Through the Looking Glass: A Dual Perspective on Weakly-Supervised Few-Shot Segmentation
Title（参考訳）: 見た目のガラスを通して:弱めに監督されたFew-Shotセグメンテーションの2つの視点
Authors: Jiaqi Ma, Guo-Sen Xie, Fang Zhao, Zechao Li,
Abstract要約: メタラーニングは、同じカテゴリと類似属性を特徴とする均一なサポートクエリペアを均一にサンプリングすることを目的としている。この同一のネットワーク設計は、過剰なセマンティックな均質化をもたらす。本稿では,相補性を向上し,意味的共通性を維持するための,新しい異種ネットワークを提案する。弱教師付き少ショットセマンティックセグメンテーション(WFSS)タスクでは、TLGはPascal-5textsuperscriptiで13.2%、COCO-20textsuperscriptiで9.7%改善している。
参考スコア（独自算出の注目度）: 46.635612270422655
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Meta-learning aims to uniformly sample homogeneous support-query pairs, characterized by the same categories and similar attributes, and extract useful inductive biases through identical network architectures. However, this identical network design results in over-semantic homogenization. To address this, we propose a novel homologous but heterogeneous network. By treating support-query pairs as dual perspectives, we introduce heterogeneous visual aggregation (HA) modules to enhance complementarity while preserving semantic commonality. To further reduce semantic noise and amplify the uniqueness of heterogeneous semantics, we design a heterogeneous transfer (HT) module. Finally, we propose heterogeneous CLIP (HC) textual information to enhance the generalization capability of multimodal models. In the weakly-supervised few-shot semantic segmentation (WFSS) task, with only 1/24 of the parameters of existing state-of-the-art models, TLG achieves a 13.2\% improvement on Pascal-5\textsuperscript{i} and a 9.7\% improvement on COCO-20\textsuperscript{i}. To the best of our knowledge, TLG is also the first weakly supervised (image-level) model that outperforms fully supervised (pixel-level) models under the same backbone architectures. The code is available at https://github.com/jarch-ma/TLG.
Abstract（参考訳）: メタラーニングの目的は、同一のカテゴリと類似の属性を特徴とする均一なサポートクエリ対を均一にサンプリングし、同一のネットワークアーキテクチャを通して有用な帰納バイアスを抽出することである。しかし、この同一のネットワーク設計は、過剰なセマンティックな均質化をもたらす。そこで本研究では,新しい同種ネットワークを提案する。サポートクエリペアを双対的な視点として扱うことにより、意味的共通性を維持しながら相補性を高めるために、異種視覚アグリゲーション(HA)モジュールを導入する。さらにセマンティックノイズを低減し、不均一なセマンティクスの特異性を増幅するために、ヘテロジニアストランスファー(HT)モジュールを設計する。最後に,多モードモデルの一般化能力を高めるために,不均一なCLIP(HC)テキスト情報を提案する。既存の最先端モデルのパラメータの1/24しか持たないWFSSタスクでは、TLGはPascal-5\textsuperscript{i}で13.2\%、COCO-20\textsuperscript{i}で9.7\%改善した。我々の知る限りでは、TLGは、同じバックボーンアーキテクチャの下で完全に教師付き(ピクセルレベルの)モデルを上回る、最初の弱い教師付き(イメージレベルの)モデルである。コードはhttps://github.com/jarch-ma/TLG.comで公開されている。

論文の概要: Through the Looking Glass: A Dual Perspective on Weakly-Supervised Few-Shot Segmentation

関連論文リスト