Fugu-MT 論文翻訳(概要): SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation

論文の概要: SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation

arxiv url: http://arxiv.org/abs/2605.17630v2
Date: Tue, 19 May 2026 18:36:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 14:55:44.305481
Title: SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation
Title（参考訳）: SegRAG: トレーニング不要の検索拡張セマンティックセマンティックセマンティックセグメンテーション
Authors: Abderrahmene Boudiaf, Irfan Hussain, Sajid Javed,
Abstract要約: SegRAGは、トレーニング不要の検索拡張セグメンテーションフレームワークである。 SAM3には、DINOv3機能バンクから派生したクラス固有のポイントプロンプトがある。 4つの標準ベンチマークでは、SegRAGはテキストのみのベースラインを一貫して上回っている。
参考スコア（独自算出の注目度）: 13.665861251747144
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Open-vocabulary segmentation models such as SAM3 perform well across broad categories via text prompting, yet degrade when target classes are visually underrepresented in pretraining or depart from canonical depictions-limitations text prompts cannot resolve spatially. We present SegRAG, a training-free retrieval-augmented segmentation framework that grounds SAM3 with class-specific point prompts derived from a curated DINOv3 feature bank. Offline, dense patch-level descriptors are extracted from annotated references and filtered by Intra-Class Cohesion Distillation (ICCD), retaining only prototypes that reliably retrieve within-class foreground. At inference, Topographic Similarity Grounding (TSG) computes a cosine-similarity landscape against retrieved prototypes, identifies coherent high-confidence regions via connected-component analysis, and extracts peak locations through non-maximum suppression. The resulting point prompts are delivered jointly with class-name text in a single SAM3 forward pass. On four standard benchmarks, SegRAG consistently outperforms the text-only baseline, gaining up to +3.92 mIoU on LVIS. On AgML agricultural benchmarks under zero-shot domain transfer, it raises mean IoU from 25.27 to 59.24 (+33.97) and recovers individual classes from zero to over 95 mIoU. Ablations confirm that ICCD, TSG, and joint prompting each contribute independently and compound when combined. Code is available at (https://github.com/boudiafA/SegRAG).
Abstract（参考訳）: SAM3のようなオープン語彙のセグメンテーションモデルは、テキストプロンプトによって幅広いカテゴリにわたってうまく機能するが、ターゲットクラスが前訓練や標準描写からの離脱時に視覚的に過小評価されている場合、テキストプロンプトは空間的に解決できない。本稿では,DINOv3機能バンクから派生したクラス固有点プロンプトでSAM3をベースとした,学習不要な検索拡張セグメンテーションフレームワークであるSegRAGを提案する。オフラインで密集したパッチレベルの記述子は、注釈付き参照から抽出され、クラス内凝集蒸留(ICCD)によってフィルタリングされ、クラス内フォアグラウンドを確実に回収するプロトタイプのみを保持する。推定において、TSG(Topographic similarity Grounding)は、検索したプロトタイプに対してコサイン類似の景観を計算し、コヒーレントな高信頼領域をコヒーレント成分分析により同定し、非最大抑圧によりピーク位置を抽出する。得られたポイントプロンプトは、クラス名テキストと1つのSAM3フォワードパスで共同で配信される。 4つの標準ベンチマークでは、SegRAGはテキストのみのベースラインを一貫して上回り、LVISでは+3.92 mIoUまで上昇した。ゼロショットドメイン転送下でのAgML農業ベンチマークでは、平均IoUを25.27から59.24(+33.97)に引き上げ、個々のクラスを0から95mIoUに復元する。 ICCD、TSG、ジョイントプロンプトはそれぞれ独立して寄与し、結合すると複合する。コードはhttps://github.com/boudiafA/SegRAG)で入手できる。

論文の概要: SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation

関連論文リスト