Fugu-MT 論文翻訳(概要): Mitigating Objectness Bias and Region-to-Text Misalignment for Open-Vocabulary Panoptic Segmentation

論文の概要: Mitigating Objectness Bias and Region-to-Text Misalignment for Open-Vocabulary Panoptic Segmentation

arxiv url: http://arxiv.org/abs/2603.21386v1
Date: Sun, 22 Mar 2026 20:11:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.392266
Title: Mitigating Objectness Bias and Region-to-Text Misalignment for Open-Vocabulary Panoptic Segmentation
Title（参考訳）: Open-Vocabulary Panoptic Segmentation における目的性バイアスと領域間ミスアライメント
Authors: Nikolay Kormushev, Josip Šarić, Matej Kristan,
Abstract要約: オープン・ボキャブラリ・パン光学セグメンテーションのためのシンプルなモジュラー・フレームワークであるOVRCOATを紹介する。 COATはバックグラウンド/地上確率を更新し、語彙外オブジェクトのための高品質なマスクを保存する。 OVRCOATはADE20Kに新たな技術状況を設定し、Mapillary VistasとCityscapesに明確な利益をもたらす。
参考スコア（独自算出の注目度）: 10.606571495908485
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open-vocabulary panoptic segmentation remains hindered by two coupled issues: (i) mask selection bias, where objectness heads trained on closed vocabularies suppress masks of categories not observed in training, and (ii) limited regional understanding in vision-language models such as CLIP, which were optimized for global image classification rather than localized segmentation. We introduce OVRCOAT, a simple, modular framework that tackles both. First, a CLIP-conditioned objectness adjustment (COAT) updates background/foreground probabilities, preserving high-quality masks for out-of-vocabulary objects. Second, an open-vocabulary mask-to-text refinement (OVR) strengthens CLIP's region-level alignment to improve classification of both seen and unseen classes with markedly lower memory cost than prior fine-tuning schemes. The two components combine to jointly improve objectness estimation and mask recognition, yielding consistent panoptic gains. Despite its simplicity, OVRCOAT sets a new state of the art on ADE20K (+5.5% PQ) and delivers clear gains on Mapillary Vistas and Cityscapes (+7.1% and +3% PQ, respectively). The code is available at: https://github.com/nickormushev/OVRCOAT
Abstract（参考訳）: オープン・ボキャブラリ・パノプティクス・セグメンテーションは、以下の2つの複合問題によって妨げられている。一閉じた語彙で訓練された客観性頭が訓練で観察されていないカテゴリーのマスクを抑えるマスク選択バイアス (II) 局所的セグメンテーションよりもグローバルな画像分類に最適化されたCLIPのような視覚言語モデルにおける限られた地域的理解。 OVRCOATはシンプルでモジュラーなフレームワークで、両方に取り組みます。第一に、CLIP条件付きオブジェクトネス調整(COAT)は、背景/地上の確率を更新し、語彙外オブジェクトのための高品質なマスクを保存する。第二に、OVR(Open-vocabulary mask-to-text refinement)はCLIPの領域レベルのアライメントを強化し、従来の微調整方式よりもメモリコストが著しく低く、目に見えるクラスと見えないクラスの分類を改善する。 2つのコンポーネントを組み合わせることで、オブジェクトネス推定とマスク認識を共同で改善し、一貫したパノプティクスゲインが得られる。その単純さにもかかわらず、OVRCOATはADE20K(+5.5% PQ)に新しい最先端技術を設定し、Mapillary VistasとCityscapes(+7.1%と+3% PQ)に明確な利益をもたらす。コードは以下の通り。 https://github.com/nickormushev/OVRCOAT

論文の概要: Mitigating Objectness Bias and Region-to-Text Misalignment for Open-Vocabulary Panoptic Segmentation

関連論文リスト