Fugu-MT 論文翻訳(概要): SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation

論文の概要: SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation

arxiv url: http://arxiv.org/abs/2605.17630v1
Date: Sun, 17 May 2026 19:51:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:48.237489
Title: SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation
Title（参考訳）: SegRAG: トレーニング不要の検索拡張セマンティックセマンティックセマンティックセグメンテーション
Authors: Abderrahmene Boudiaf, Irfan Hussain, Sajid Javed,
Abstract要約: SegRAGは、トレーニング不要の検索拡張セグメンテーションフレームワークである。 SAM3には、DINOv3機能バンクから派生したクラス固有のポイントプロンプトがある。 4つのオープン語彙ベンチマークでは、SAM3テキストのみのベースラインよりも一貫したゲインを実現している。
参考スコア（独自算出の注目度）: 13.665861251747144
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Here's a trimmed version under 1920 characters: Open-vocabulary segmentation models such as SAM3 achieve strong performance through concept-level text prompting, yet degrade when the target class is visually underrepresented in pretraining data or when its appearance departs from canonical depictions. Text prompts provide no spatial signal to resolve such ambiguity. We present SegRAG, a training-free retrieval-augmented segmentation framework that grounds SAM3 with spatially precise, class-specific point prompts derived from a curated DINOv3 feature bank. During an offline stage, patch-level descriptors are extracted from annotated reference images using a frozen DINOv3 ViT-L/16 backbone and filtered by Intra-Class Cohesion Distillation (ICCD), retaining only prototypes that reliably retrieve within-class foreground. At inference, Topographic Similarity Grounding (TSG) computes a cosine-similarity landscape between the query image and retrieved prototypes, identifies spatially coherent high-confidence regions via connected-component analysis, and extracts peak locations through non-maximum suppression. These point prompts are delivered to SAM3 alongside the class-name text in a single joint grounding pass, enabling the mask decoder to resolve semantic intent and spatial evidence together. SegRAG requires no task-specific training and no synthetic data. On four open-vocabulary benchmarks it achieves consistent gains over the SAM3 text-only baseline, with improvements of up to +3.92 mIoU on LVIS. On AgML agricultural benchmarks representing a zero-shot domain transfer setting, it raises mean IoU from 25.27 to 59.24 (+33.97) and recovers individual classes from zero to over 95 mIoU. Ablation studies confirm that ICCD, TSG, and joint prompting each contribute independently and compound when combined. Code is available at https://github.com/boudiafA/SegRAG.
Abstract（参考訳）: SAM3のようなオープン語彙セグメンテーションモデルは、概念レベルのテキストプロンプトを通じて強力なパフォーマンスを達成するが、ターゲットクラスが事前訓練データで視覚的に不足している場合や、その外観が標準的描写から外れている場合、劣化する。テキストプロンプトは、そのような曖昧さを解決するための空間信号を提供しない。我々は,DINOv3特徴バンクから抽出した空間的精度の高いクラス固有点プロンプトでSAM3をベースとした,学習不要な検索拡張セグメンテーションフレームワークであるSegRAGを提案する。オフラインの段階では、凍結したDINOv3 ViT-L/16バックボーンを用いて注釈付き参照画像からパッチレベルのディスクリプタを抽出し、クラス内凝集蒸留(ICCD)によってフィルタリングし、クラス内フォアグラウンドを確実に回収するプロトタイプのみを保持する。推測において、TSG(Topographic similarity Grounding)は、クエリ画像と検索されたプロトタイプの間のコサイン類似の景観を計算し、連結成分分析により空間的に一貫性の高い高信頼領域を特定し、非最大抑圧によりピーク位置を抽出する。これらのポイントプロンプトはSAM3に1つのジョイントグラウンドパスのクラス名テキストと共に配信され、マスクデコーダは意味的意図と空間的証拠を一緒に解決する。 SegRAGはタスク固有のトレーニングを必要とせず、合成データも必要としない。 4つのオープン語彙ベンチマークではSAM3テキストのみのベースラインよりも一貫したゲインを実現し、LVISでは+3.92 mIoUまで改善されている。ゼロショットドメイン転送設定を表すAgML農業ベンチマークでは、平均IoUが25.27から59.24(+33.97)に上昇し、個々のクラスが0から95mIoUに回復する。アブレーション研究により、ICCD、TSG、ジョイントプロンプトはそれぞれ独立して寄与し、結合すると複合することが明らかとなった。コードはhttps://github.com/boudiafA/SegRAGで入手できる。

論文の概要: SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation

関連論文リスト