Fugu-MT 論文翻訳(概要): OptiSAR-Net++: A Large-Scale Benchmark and Transformer-Free Framework for Cross-Domain Remote Sensing Visual Grounding

論文の概要: OptiSAR-Net++: A Large-Scale Benchmark and Transformer-Free Framework for Cross-Domain Remote Sensing Visual Grounding

arxiv url: http://arxiv.org/abs/2603.24876v1
Date: Wed, 25 Mar 2026 23:46:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 20:52:48.017278
Title: OptiSAR-Net++: A Large-Scale Benchmark and Transformer-Free Framework for Cross-Domain Remote Sensing Visual Grounding
Title（参考訳）: OptiSAR-Net++: 大規模ベンチマークとトランスフォーマーフリーフレームワーク
Authors: Xiaoyu Tang, Jun Dong, Jintao Cheng, Rui Fan,
Abstract要約: 我々は、クロスドメインRSVGタスクを導入し、この設定のための最初の大規模ベンチマークデータセットであるOpsSAR-RSVGを構築します。クロスドメイン機能モデリングの課題に対処するため,OptiSAR-Net++を提案する。我々のフレームワークは、効率的なクロスドメイン機能デカップリングのためのパッチレベルのLow-Rank Adaptation Mixture of Experts (PL-MoE)を備えている。
参考スコア（独自算出の注目度）: 9.108103619472788
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Remote sensing visual grounding (RSVG) aims to localize specific targets in remote sensing images using natural language expressions. However, existing methods are restricted to single-sensor domains, i.e., either optical or synthetic aperture radar (SAR), limiting their real-world applicability. In this paper, we introduce the Cross-Domain RSVG (CD-RSVG) task and construct OptSAR-RSVG, the first large-scale benchmark dataset for this setting. To tackle the challenges of cross-domain feature modeling, computational inefficiency, and fine-grained semantic discrimination, we propose OptiSAR-Net++. Our framework features a patch-level Low-Rank Adaptation Mixture of Experts (PL-MoE) for efficient cross-domain feature decoupling. To mitigate the substantial computational overhead of Transformer decoding frameworks, we adopt a CLIP-based contrastive paradigm and further incorporate dynamic adversarial negative sampling, thereby transforming generative regression into an efficient cross-modal matching process. Additionally, a text-guided dual-gate fusion module (TGDF-SSA) and a region-aware auxiliary head are introduced to enhance semantic-visual alignment and spatial modeling. Extensive experiments demonstrate that OptiSAR-Net++ achieves SOTA performance on both OptSAR-RSVG and DIOR-RSVG benchmarks, offering significant advantages in localization accuracy and efficiency. Our code and dataset will be made publicly available.
Abstract（参考訳）: リモートセンシング視覚グラウンドティング(RSVG)は、自然言語表現を用いて、リモートセンシング画像中の特定のターゲットをローカライズすることを目的としている。しかし、既存の手法は単一センサー領域、すなわち光学的または合成開口レーダー(SAR)に制限されており、現実の応用性が制限されている。本稿では,Cross-Domain RSVG(CD-RSVG)タスクを導入し,この設定のための最初の大規模ベンチマークデータセットであるOpsSAR-RSVGを構築する。ドメイン間特徴モデリング,計算不効率,きめ細かな意味的識別の課題に対処するため,OptiSAR-Net++を提案する。我々のフレームワークは、効率的なクロスドメイン機能デカップリングのためのパッチレベルのLow-Rank Adaptation Mixture of Experts (PL-MoE)を備えている。トランスフォーマーデコーディングフレームワークの計算オーバーヘッドを大幅に軽減するため、CLIPベースのコントラスト的パラダイムを採用し、動的対向陰性サンプリングをさらに取り入れ、生成回帰を効率的なクロスモーダルマッチングプロセスに変換する。さらに、テキスト誘導デュアルゲート融合モジュール(TGDF-SSA)と領域認識補助ヘッドを導入し、セマンティック視覚アライメントと空間モデリングを強化する。 OptiSAR-Net++はOptSAR-RSVGベンチマークとDIOR-RSVGベンチマークの両方でSOTA性能を実現しており、ローカライゼーションの精度と効率に大きな利点がある。コードとデータセットは公開されます。

論文の概要: OptiSAR-Net++: A Large-Scale Benchmark and Transformer-Free Framework for Cross-Domain Remote Sensing Visual Grounding

関連論文リスト