Fugu-MT 論文翻訳(概要): Decouple and Rectify: Semantics-Preserving Structural Enhancement for Open-Vocabulary Remote Sensing Segmentation

論文の概要: Decouple and Rectify: Semantics-Preserving Structural Enhancement for Open-Vocabulary Remote Sensing Segmentation

arxiv url: http://arxiv.org/abs/2604.02010v1
Date: Thu, 02 Apr 2026 13:15:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.805647
Title: Decouple and Rectify: Semantics-Preserving Structural Enhancement for Open-Vocabulary Remote Sensing Segmentation
Title（参考訳）: 復号化・復号化:オープンボキャブラリリモートセンシングセグメンテーションのためのセマンティックス保存構造強化
Authors: Jie Feng, Fengze Li, Junpeng Zhang, Siyu Chen, Yuping Liang, Junying Chen, Ronghua Shang,
Abstract要約: リモートセンシング(RS)分野におけるオープンボキャブラリセマンティックセマンティックセマンティックセマンティクスは、言語対応認識と細粒度空間デライン化の両方を必要とする。最近の手法は、RS-pretrained DINO特徴を導入して、これを補おうとしている。本稿では, DR-Segを提案する。
参考スコア（独自算出の注目度）: 23.298715255853782
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-vocabulary semantic segmentation in the remote sensing (RS) field requires both language-aligned recognition and fine-grained spatial delineation. Although CLIP offers robust semantic generalization, its global-aligned visual representations inherently struggle to capture structural details. Recent methods attempt to compensate for this by introducing RS-pretrained DINO features. However, these methods treat CLIP representations as a monolithic semantic space and cannot localize where structural enhancement is required, failing to effectively delineate boundaries while risking the disruption of CLIP's semantic integrity. To address this limitation, we propose DR-Seg, a novel decouple-and-rectify framework in this paper. Our method is motivated by the key observation that CLIP feature channels exhibit distinct functional heterogeneity rather than forming a uniform semantic space. Building on this insight, DR-Seg decouples CLIP features into semantics-dominated and structure-dominated subspaces, enabling targeted structural enhancement by DINO without distorting language-aligned semantics. Subsequently, a prior-driven graph rectification module injects high-fidelity structural priors under DINO guidance to form a refined branch, while an uncertainty-guided adaptive fusion module dynamically integrates this refined branch with the original CLIP branch for final prediction. Comprehensive experiments across eight benchmarks demonstrate that DR-Seg establishes a new state-of-the-art.
Abstract（参考訳）: リモートセンシング(RS)分野におけるオープンボキャブラリセマンティックセマンティックセマンティックセマンティクスは、言語対応認識と細粒度空間デライン化の両方を必要とする。 CLIPは堅牢なセマンティック・ジェネリゼーションを提供するが、そのグローバル・アラインな視覚表現は本質的に構造的詳細を捉えるのに苦労している。最近の手法は、RS-pretrained DINO特徴を導入して、これを補おうとしている。しかし、これらのメソッドはCLIP表現をモノリシックなセマンティック空間として扱い、構造的拡張が必要な場所をローカライズできない。この制限に対処するため,我々はDR-Segを提案する。この手法は,CLIPの特徴チャネルが一様意味空間を形成するのではなく,機能的不均一性を示すというキーとなる観察に動機付けられている。この洞察に基づいて、DR-SegはCLIP機能をセマンティクスと構造を支配下に置くサブスペースに分離する。その後、事前駆動グラフ修正モジュールは、DINOガイダンスの下で高忠実度構造前駆体を注入して洗練された分岐を形成する一方、不確実性誘導適応核融合モジュールは、最終的な予測のために、この洗練された分岐を元のCLIPブランチと動的に統合する。 8つのベンチマークにわたる総合的な実験は、DR-Segが新しい最先端技術を確立していることを示している。

論文の概要: Decouple and Rectify: Semantics-Preserving Structural Enhancement for Open-Vocabulary Remote Sensing Segmentation

関連論文リスト