Fugu-MT 論文翻訳(概要): CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation

論文の概要: CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation

arxiv url: http://arxiv.org/abs/2510.11173v1
Date: Mon, 13 Oct 2025 09:07:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:30.28447
Title: CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation
Title（参考訳）: CoPRS: セグメンテーションの推論のためのチェーン・オブ・サートからの事前学習
Authors: Zhenyu Lu, Liupeng Li, Jinpeng Wang, Yan Feng, Bin Chen, Ke Chen, Yaowei Wang,
Abstract要約: CoPRSは、ヘアマップとしてインスタンス化された、微分可能で解釈可能な位置推定を通じて、セグメンテーションへの言語推論をブリッジする。学習可能な集中トークンは、画像の特徴と推論テキストを集約して、この位置先を生成する。
参考スコア（独自算出の注目度）: 51.25997439181537
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing works on reasoning segmentation either connect hidden features from a language model directly to a mask decoder or represent positions in text, which limits interpretability and semantic detail. To solve this, we present CoPRS, a Multi-modal Chain-of-Thought (MCoT)-based positional perception model that bridges language reasoning to segmentation through a differentiable and interpretable positional prior instantiated as a heatmap. By making the reasoning process clear via MCoT and expressing it as a dense, differentiable heatmap, this interface enhances interpretability and diagnostic analysis and yields more concentrated evidence on the target. A learnable concentration token aggregates features of the image and reasoning text to generate this positional prior, which is decoded to precise masks through a lightweight decoder, providing a direct connection between reasoning and segmentation. Across the RefCOCO series and ReasonSeg, CoPRS matches or surpasses the best reported metrics on each standard split under comparable protocols, with performance at or above prior state of the art across both validation and test partitions. Extensive experiments reveal that the quality of the heatmap strongly influences the resulting mask quality, supporting a consistent association between the reasoning output and downstream mask generation. Collectively, these findings support the utility of this paradigm in bridging reasoning and segmentation and show advantages in concentration driven by reasoning and predicting masks more precisely. Code, checkpoints and logs are released at https://github.com/ZhenyuLU-Heliodore/CoPRS.git.
Abstract（参考訳）: 既存のセグメンテーションの推論作業は、言語モデルから隠れた機能をマスクデコーダに直接接続するか、テキスト中の位置を表現し、解釈可能性とセマンティックディテールを制限する。そこで本研究では,マルチモーダル・チェーン・オブ・ソート(MCoT)に基づく位置認識モデルであるCoPRSを提案する。 MCoTを介して推論プロセスを明確化し、それを密度の高い微分可能な熱マップとして表現することにより、このインターフェースは解釈可能性と診断分析を強化し、ターゲットに対してより深い証拠を得る。学習可能な集中トークンは、画像の特徴と推論テキストを集約して、この位置先を生成する。これは軽量デコーダを介して正確なマスクにデコードされ、推論とセグメンテーションの直接的な接続を提供する。 RefCOCOシリーズとReasonSeg全体において、CoPRSは、検証とテストのパーティションをまたいだ以前の最先端のパフォーマンスと同等のプロトコルの下で、各標準分割で報告された最高のメトリクスと一致または超えている。熱マップの品質が結果のマスク品質に強く影響を与え、推論出力と下流マスク生成の一貫性を支えていることが明らかとなった。これらの知見は総合的に, このパラダイムのブリッジ理論とセグメンテーションにおける有用性を支持し, より正確にマスクの推理と予測によって引き起こされる濃度の優位性を示す。コード、チェックポイント、ログはhttps://github.com/ZhenyuLU-Heliodore/CoPRS.gitで公開されている。

論文の概要: CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation

関連論文リスト