Fugu-MT 論文翻訳(概要): TALENT: Target-aware Efficient Tuning for Referring Image Segmentation

論文の概要: TALENT: Target-aware Efficient Tuning for Referring Image Segmentation

arxiv url: http://arxiv.org/abs/2604.00609v1
Date: Wed, 01 Apr 2026 08:13:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:31.900761
Title: TALENT: Target-aware Efficient Tuning for Referring Image Segmentation
Title（参考訳）: TALENT:画像セグメント参照のためのターゲット認識能率チューニング
Authors: Shuo Jin, Siyue Yu, Bingfeng Zhang, Chao Yao, Meiqin Liu, Jimin Xiao,
Abstract要約: イメージセグメンテーションの参照は、自然なテキスト表現に基づいて特定のターゲットをセグメンテーションすることを目的としている。既存のPETベースのメソッドは、視覚的特徴がテキスト参照ターゲットインスタンスを強調できないという事実に悩まされることが多い。本稿では,PET ベースの RIS に対して,目標認識を効果的に調整する新しいフレームワーク TALENT を提案する。
参考スコア（独自算出の注目度）: 42.766432845564786
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Referring image segmentation aims to segment specific targets based on a natural text expression. Recently, parameter-efficient tuning (PET) has emerged as a promising paradigm. However, existing PET-based methods often suffer from the fact that visual features can't emphasize the text-referred target instance but activate co-category yet unrelated objects. We analyze and quantify this problem, terming it the `non-target activation' (NTA) issue. To address this, we propose a novel framework, TALENT, which utilizes target-aware efficient tuning for PET-based RIS. Specifically, we first propose a Rectified Cost Aggregator (RCA) to efficiently aggregate text-referred features. Then, to calibrate `NTA' into accurate target activation, we adopt a Target-aware Learning Mechanism (TLM), including contextual pairwise consistency learning and target-centric contrastive learning. The former uses the sentence-level text feature to achieve a holistic understanding of the referent and constructs a text-referred affinity map to optimize the semantic association of visual features. The latter further enhances target localization to discover the distinct instance while suppressing associations with other unrelated ones. The two objectives work in concert and address `NTA' effectively. Extensive evaluations show that TALENT outperforms existing methods across various metrics (e.g., 2.5\% mIoU gains on G-Ref val set). Our codes will be released at: https://github.com/Kimsure/TALENT.
Abstract（参考訳）: イメージセグメンテーションの参照は、自然なテキスト表現に基づいて特定のターゲットをセグメンテーションすることを目的としている。近年,パラメータ効率チューニング (PET) が有望なパラダイムとして登場している。しかし、既存のPETベースの手法は、視覚的特徴がテキスト参照されたターゲットインスタンスを強調できないという事実に悩まされることが多い。我々はこの問題を分析・定量化し、それを「非ターゲットアクティベーション(NTA)」問題と呼ぶ。そこで本研究では,PET ベースの RIS に対して,目標認識を効率よくチューニングする新しいフレームワーク TALENT を提案する。具体的には、まず、テキスト参照機能を効率的に集約するRectified Cost Aggregator (RCA)を提案する。そして,「NTA」を正確なターゲットアクティベーションにキャリブレーションするために,文脈的相互整合性学習と目標中心のコントラスト学習を含むターゲット認識学習機構(TLM)を採用する。前者は、文章レベルのテキスト機能を使用して、参照者の全体的理解を達成し、テキスト参照親和性マップを構築し、視覚的特徴のセマンティックアソシエーションを最適化する。後者は、他の無関係なものとの関連を抑えながら、異なるインスタンスを発見するために、ターゲットのローカライゼーションをさらに強化する。 2つの目的はコンサートで働き、「NTA」を効果的に扱う。広範囲な評価の結果,TALENT は G-Ref val 上での 2.5\% mIoU ゲインなど,様々な指標で既存手法よりも優れていた。コードについては、https://github.com/Kimsure/TALENT.comで公開します。

論文の概要: TALENT: Target-aware Efficient Tuning for Referring Image Segmentation

関連論文リスト