Fugu-MT 論文翻訳(概要): DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy

論文の概要: DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy

arxiv url: http://arxiv.org/abs/2507.01738v1
Date: Wed, 02 Jul 2025 14:14:35 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-03 14:23:00.281846
Title: DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy
Title（参考訳）: DeRIS:Loopback Synergyによる画像セグメンテーションの強化のための認識と認知の分離
Authors: Ming Dai, Wenxuan Cheng, Jiang-jiang Liu, Sen Yang, Wenxiao Cai, Yanpeng Sun, Wankou Yang,
Abstract要約: RISを認知と認知という2つの重要な構成要素に分解する新しいフレームワークであるDeRISを提案する。以上の結果から,従来のモデルでは知覚障害ではなく,マルチモーダル認知能力が不十分であることが示唆された。本稿では,ターゲット存在判定に関連する長期分布問題に対処するため,単純な非参照型サンプル変換データ拡張を提案する。
参考スコア（独自算出の注目度）: 15.729826041347144
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Referring Image Segmentation (RIS) is a challenging task that aims to segment objects in an image based on natural language expressions. While prior studies have predominantly concentrated on improving vision-language interactions and achieving fine-grained localization, a systematic analysis of the fundamental bottlenecks in existing RIS frameworks remains underexplored. To bridge this gap, we propose DeRIS, a novel framework that decomposes RIS into two key components: perception and cognition. This modular decomposition facilitates a systematic analysis of the primary bottlenecks impeding RIS performance. Our findings reveal that the predominant limitation lies not in perceptual deficiencies, but in the insufficient multi-modal cognitive capacity of current models. To mitigate this, we propose a Loopback Synergy mechanism, which enhances the synergy between the perception and cognition modules, thereby enabling precise segmentation while simultaneously improving robust image-text comprehension. Additionally, we analyze and introduce a simple non-referent sample conversion data augmentation to address the long-tail distribution issue related to target existence judgement in general scenarios. Notably, DeRIS demonstrates inherent adaptability to both non- and multi-referents scenarios without requiring specialized architectural modifications, enhancing its general applicability. The codes and models are available at https://github.com/Dmmm1997/DeRIS.
Abstract（参考訳）: Referring Image Segmentation (RIS)は、自然言語表現に基づいた画像内のオブジェクトのセグメンテーションを目的とした課題である。従来の研究は視覚-言語相互作用の改善と微粒化の達成に重点を置いてきたが、既存のRISフレームワークの基本的ボトルネックの体系的分析は未解明のままである。このギャップを埋めるために、私たちは、RISを認知と認知という2つの重要な構成要素に分解する新しいフレームワークであるDeRISを提案する。このモジュラ分解は、RIS性能を妨げる主要なボトルネックを体系的に解析するのに役立つ。以上の結果から,従来のモデルでは知覚障害ではなく,マルチモーダル認知能力が不十分であることが示唆された。これを軽減するために,認識モジュールと認識モジュールの相乗効果を高めるループバック・シナジー機構を提案する。さらに,本研究では,一般シナリオにおけるターゲット存在判断に関連する長期分布問題に対処するため,単純な非参照型サンプル変換データ拡張を解析,導入する。特に、DeRISは特別なアーキテクチャ変更を必要とせず、非参照シナリオとマルチ参照シナリオの両方に固有の適応性を示し、その汎用性を高めている。コードとモデルはhttps://github.com/Dmmm1997/DeRISで公開されている。

論文の概要: DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy

関連論文リスト