Fugu-MT 論文翻訳(概要): Learning a Semantic Calibration Network for Open-Vocabulary Semantic Segmentation

論文の概要: Learning a Semantic Calibration Network for Open-Vocabulary Semantic Segmentation

arxiv url: http://arxiv.org/abs/2606.08001v1
Date: Sat, 06 Jun 2026 06:42:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:05.611115
Title: Learning a Semantic Calibration Network for Open-Vocabulary Semantic Segmentation
Title（参考訳）: オープンボキャブラリセマンティックセマンティックセマンティックセマンティックセマンティクスのためのセマンティックキャリブレーションネットワークの学習
Authors: Yang Sun, Tao Wang, Anastasia Ioannou, Ge Xu,
Abstract要約: Open-Vocabulary (OVS) は、セグメンテーションタスクを固定集合から開集合に拡張する。オープン語彙セマンティックセマンティックスセグメンテーションのための新しいセマンティックネットワーク(SCN)を提案する。提案手法は,最先端のアルゴリズムと比較して,大幅な性能向上を実現している。
参考スコア（独自算出の注目度）: 5.122331812021513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic image segmentation assigns a predefined category label to each pixel, has achieved significant progress lately. Open-Vocabulary Segmentation (OVS) extends the segmentation task from a fixed set to an open set, enabling the identification and segmentation of novel concepts based on arbitrary text inputs, such as category names or descriptions. In this paper, we propose a novel Semantic Calibration Network (SCN) for open-vocabulary semantic segmentation. Different from prior approaches that focus on feature aggregation or simple fine-tuning of pre-trained models, SCN refines the mask classification process by explicitly modeling the semantic correlations between classes, aiming to enhance the model's discriminative power while effectively preserving the generalization abilities of the pre-trained CLIP model. Specifically, SCN comprises two core components: Class Disambiguation (CD) and Logits Fusion (LF). First, a cross-attention mechanism is utilized to transform the text embeddings into visually aware pseudo-text embeddings, in order to derive an enhanced similarity score that complements the original mask-text similarity score. Subsequently, the Class Disambiguation module captures implicit inter-class dependencies through a residual architecture to effectively resolve semantic ambiguities. Finally, the Logits Fusion module dynamically integrates multifaceted semantic evidence to ensure that the model achieves a robust semantic consensus while maintaining CLIP's inherent generalization capability. Comprehensive experimental results on mainstream benchmarks demonstrate that the proposed method achieves significant performance improvements compared to state-of-the-art algorithms.
Abstract（参考訳）: セマンティックイメージセグメンテーションは,各画素に予め定義されたカテゴリラベルを割り当て,近年大きな進歩を遂げている。 Open-Vocabulary Segmentation (OVS)は、セグメンテーションタスクを固定セットからオープンセットに拡張し、カテゴリ名や記述などの任意のテキスト入力に基づいて、新しい概念の識別とセグメンテーションを可能にする。本稿では,オープン語彙セマンティックセマンティックセマンティックスセグメンテーションのためのセマンティックキャリブレーションネットワーク(SCN)を提案する。事前訓練モデルの特徴集約や単純な微調整に焦点を当てた従来のアプローチとは異なり、SCNはクラス間の意味的相関を明示的にモデル化することでマスク分類プロセスを洗練し、事前訓練されたCLIPモデルの一般化能力を効果的に保ちながらモデルの識別力を高めることを目的としている。具体的には、SCNはクラス曖昧化(CD)とロジッツ融合(LF)の2つのコアコンポーネントから構成される。まず、クロスアテンション機構を用いて、元のマスクテキスト類似度スコアを補完する拡張類似度スコアを導出するために、テキスト埋め込みを視覚的に認識された擬似テキスト埋め込みに変換する。その後、Class Disambiguationモジュールは、残存アーキテクチャを通じて暗黙のクラス間の依存関係をキャプチャして、意味的曖昧さを効果的に解決する。最後に、Logits Fusionモジュールはマルチフェイスセマンティックエビデンスを動的に統合し、CLIP固有の一般化能力を維持しながら、モデルが堅牢なセマンティックコンセンサスを実現する。提案手法は, 最先端のアルゴリズムと比較して, 大幅な性能向上を実現していることを示す。

論文の概要: Learning a Semantic Calibration Network for Open-Vocabulary Semantic Segmentation

関連論文リスト