Fugu-MT 論文翻訳(概要): Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation

論文の概要: Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation

arxiv url: http://arxiv.org/abs/2509.11102v1
Date: Sun, 14 Sep 2025 05:40:35 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 17:26:22.919998
Title: Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation
Title（参考訳）: ギャップを埋める:リモートセンシングセマンティックセマンティックセグメンテーションにおけるモダリティの欠如のためのマルチタスクハイブリッドマルチスケール生成フレームワーク
Authors: Nhi Kieu, Kien Nguyen, Arnold Wiliem, Clinton Fookes, Sridha Sridharan,
Abstract要約: マルチモーダル学習は、通常の単調モデルと比較して大きな性能向上を示した。現実のシナリオでは、センサーの故障と悪天候のためにマルチモーダル信号が欠落する可能性がある。本稿では,これらの制約に対処するために,GEMMNet(Generative-Enhanced MultiModal Learning Network)を提案する。
参考スコア（独自算出の注目度）: 28.992992584085787
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Multimodal learning has shown significant performance boost compared to ordinary unimodal models across various domains. However, in real-world scenarios, multimodal signals are susceptible to missing because of sensor failures and adverse weather conditions, which drastically deteriorates models' operation and performance. Generative models such as AutoEncoder (AE) and Generative Adversarial Network (GAN) are intuitive solutions aiming to reconstruct missing modality from available ones. Yet, their efficacy in remote sensing semantic segmentation remains underexplored. In this paper, we first examine the limitations of existing generative approaches in handling the heterogeneity of multimodal remote sensing data. They inadequately capture semantic context in complex scenes with large intra-class and small inter-class variation. In addition, traditional generative models are susceptible to heavy dependence on the dominant modality, introducing bias that affects model robustness under missing modality conditions. To tackle these limitations, we propose a novel Generative-Enhanced MultiModal learning Network (GEMMNet) with three key components: (1) Hybrid Feature Extractor (HyFEx) to effectively learn modality-specific representations, (2) Hybrid Fusion with Multiscale Awareness (HyFMA) to capture modality-synergistic semantic context across scales and (3) Complementary Loss (CoLoss) scheme to alleviate the inherent bias by encouraging consistency across modalities and tasks. Our method, GEMMNet, outperforms both generative baselines AE, cGAN (conditional GAN), and state-of-the-art non-generative approaches - mmformer and shaspec - on two challenging semantic segmentation remote sensing datasets (Vaihingen and Potsdam). Source code is made available.
Abstract（参考訳）: マルチモーダル学習は、様々な領域にわたる通常のユニモーダルモデルと比較して顕著な性能向上を示した。しかし、現実のシナリオでは、センサーの故障と悪天候のためにマルチモーダル信号が欠落し、モデルの動作と性能が劇的に悪化する。 AutoEncoder (AE) やGenerative Adversarial Network (GAN) のような生成モデルは、利用可能なものから欠落したモダリティを再構築することを目的とした直感的な解決策である。しかし、リモートセマンティックセマンティックセグメンテーションにおけるそれらの効果はいまだ解明されていない。本稿では,マルチモーダルリモートセンシングデータの不均一性を扱う上で,既存の生成的アプローチの限界について検討する。それらは、大きなクラス内および小さなクラス間変異を伴う複雑なシーンにおける意味的コンテキストを不十分にキャプチャする。さらに、従来の生成モデルは支配的なモダリティに大きく依存する可能性があり、モダリティの欠如の下でモデルロバスト性に影響を与えるバイアスがもたらされる。これらの制約に対処するために,(1)モダリティ固有の表現を効果的に学習するハイブリッド・フィーチャー・エクストラクタ(HyFEx),(2)マルチスケール・アウェアネス(HyFMA)を用いたハイブリッド・フュージョン(HyFMA)によるスケール間のモダリティ・シネジスティック・セマンティック・コンテキストのキャプチャ,(3)モダリティとタスク間の一貫性を促進することで固有のバイアスを軽減するコンプリメンタ・ロス(CoLoss)方式の3つの重要な要素を備えた,GEMMNet(Generative-Enhanced MultiModal Learning Network)を提案する。 GEMMNetは,2つの難解なセマンティックセグメンテーションリモートセンシングデータセット(VaihingenとPotsdam)上で,AE,cGAN(条件付きGAN),および最先端の非生成アプローチ(mmformerとshaspec)の両方に優れる。ソースコードは利用可能である。

論文の概要: Filling the Gaps: A Multitask Hybrid Multiscale Generative Framework for Missing Modality in Remote Sensing Semantic Segmentation

関連論文リスト