Fugu-MT 論文翻訳(概要): Training-Free Generalized Few-Shot Segmentation through Open-Vocabulary Semantic Arbitration

論文の概要: Training-Free Generalized Few-Shot Segmentation through Open-Vocabulary Semantic Arbitration

arxiv url: http://arxiv.org/abs/2606.09474v1
Date: Mon, 08 Jun 2026 13:30:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:07.091207
Title: Training-Free Generalized Few-Shot Segmentation through Open-Vocabulary Semantic Arbitration
Title（参考訳）: Open-Vocabulary Semantic Arbitrationによる学習自由な一般化Few-Shotセグメンテーション
Authors: Silas Kwabla Gah, Ebenezer Owusu,
Abstract要約: 汎用Few-Shot Semantic (GFSS) は伝統的に表現学習問題としてアプローチされてきた。しかし、最近の基礎モデルは、既に強力なオープン語彙認識とセグメンテーション能力を持っている。適応ではなく,凍結したセマンティックオーダの推論時間調整によって,Open-Vが解決可能であることを示す。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generalized Few-Shot Semantic Segmentation (GFSS) has traditionally been approached as a representation-learning problem, requiring task-specific adaptation to incorporate novel classes from limited support examples. Recent foundation models, however, already exhibit strong open-vocabulary recognition and segmentation capabilities, raising a different question: can GFSS be solved through inference-time coordination of frozen semantic priors rather than parameter adaptation? We answer this question with Open-V, a training-free GFSS framework that combines Segment Anything (SAM3) Promptable Concept Segmentation (PCS) with a K-shot CLIP support centroid through calibrated per-pixel semantic arbitration. OpenV introduces no trainable components and supports arbitrary semantic categories at inference time. Beyond segmentation performance, our study contributes three broader findings. First, we show that support information can be incorporated through inference-time semantic grounding, and that its contribution increases as foundation-model text priors weaken on label-disjoint vocabularies. Second, we identify a reproducibility confound in foundationmodel segmentation, demonstrating that preprocessing and evaluation-space mismatches can silently distort reported performance. Finally, we validate Open-V across PASCAL5i, COCO-20i, and ADE-OW, showing that training-free coordination of foundation-model priors generalizes across both conventional GFSS and open-vocabulary evaluation settings. On PASCAL-5i (1-shot), Open-V attains base/novel/harmonic mIoU of 78.4/77.5/77.9, without GFSS-specific training surpassing the strongest trained baseline by +17.7 HM.
Abstract（参考訳）: 汎用Few-Shot Semantic Segmentation (GFSS) は伝統的に表現学習問題としてアプローチされてきた。しかし、近年の基盤モデルは、既に強力なオープン語彙認識とセグメンテーション能力を示しており、異なる疑問を提起している。ここでは,Segment Anything (SAM3) Promptable Concept Segmentation (PCS) とK-shot CLIP (CLIP) を併用したトレーニングフリーなGFSSフレームワークであるOpen-Vについて述べる。 OpenVはトレーニング可能なコンポーネントを導入せず、推論時に任意のセマンティックカテゴリをサポートする。セグメンテーション性能以外にも,本研究はより広範な3つの発見に貢献する。まず、推測時セマンティックグラウンド化によって支援情報を組み込むことができ、基礎モデルテキストがラベルと相反する語彙に基づいて弱まるにつれて、その寄与が増加することを示す。第2に,基礎モデルセグメンテーションにおける再現可能性の相違を同定し,前処理と評価空間のミスマッチが報告された性能を無音に歪めることを示す。最後に,PASCAL5i,COCO-20i,ADE-OWにまたがるOpen-Vの有効性を検証し,従来のGFSSとオープンボキャブラリ評価設定の両方で基礎モデルのトレーニングフリー調整が一般化されていることを示す。 PASCAL-5i (1-shot)では、Open-Vはベース/ノーベル/ハーモニックmIoUが78.4/77.5/77.9であり、GFSS固有の訓練は+17.7 HMを超える。

論文の概要: Training-Free Generalized Few-Shot Segmentation through Open-Vocabulary Semantic Arbitration

関連論文リスト