Fugu-MT 論文翻訳(概要): DifferSeg: Towards Diverse Multimodal Binary Segmentation via Differential Perception and Frequency Guidance

論文の概要: DifferSeg: Towards Diverse Multimodal Binary Segmentation via Differential Perception and Frequency Guidance

arxiv url: http://arxiv.org/abs/2606.08906v1
Date: Mon, 08 Jun 2026 01:10:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.559429
Title: DifferSeg: Towards Diverse Multimodal Binary Segmentation via Differential Perception and Frequency Guidance
Title（参考訳）: DifferSeg: 差分知覚と周波数誘導による多モードバイナリセグメンテーションに向けて
Authors: Qiangqiang Zhou, Jiawei Xu, Yong Chen, Dandan Zhu, Yugen Yi, Xiaoqi Zhao,
Abstract要約: DifferSegは単純だが汎用的なマルチモーダルバイナリセグメンテーションフレームワークである。学習可能な微分作用素を用いて、多モーダルな特徴を適応的に整列し、相補性を高める。 29の公開データセットにわたって67の最先端メソッドを一貫して超えている。
参考スコア（独自算出の注目度）: 17.49886552219562
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many binary segmentation tasks, most multimodal methods rely on fixed feature concatenation for cross-modal interaction and straightforward decoder designs dominated by low-frequency semantics. %ToDO: % However, they ignore two key challenges: one is the lack of an adaptive mechanism to handle modality discrepancies and complementarity, and the other is the absence of an efficient decoding strategy to balance both high- and low-frequency representations. % In this work, we propose a simple yet general multimodal binary segmentation framework, termed DifferSeg, to address both problems simultaneously. With the help of the differential perception fusion (DPF) module, DifferSeg employs learnable differential operators to adaptively align multimodal features and enhance their complementarity through residual fusion, effectively mitigating modality mismatch and fusion redundancy. % In addition, we design a frequency-guided decoder (FGD) that builds cross-frequency interactions and multi-path upsampling to maintain consistency between detailed high-frequency structures and semantic low-frequency representations, ensuring fine-grained boundary recovery and noise suppression. % Benefiting from these designs, DifferSeg can be easily generalized to diverse binary segmentation tasks, including both natural and medical modalities. Without bells and whistles, it consistently surpasses 67 state-of-the-art methods across 29 public datasets involving 18 downstream tasks, demonstrating superior generalization and segmentation accuracy.Code and pretrained models will be available at the Link.
Abstract（参考訳）: 多くのバイナリセグメンテーションタスクでは、ほとんどのマルチモーダルメソッドは、クロスモーダル相互作用と低頻度セマンティクスに支配される単純なデコーダ設計のための固定された特徴結合に依存している。 %ToDO: % しかし、それらは2つの主要な課題を無視している: 1つは、モダリティの相違と相補性を扱う適応メカニズムの欠如、もう1つは高頻度と低周波の両表現のバランスをとる効率的な復号戦略の欠如である。本研究では,両問題に同時に対処するために,DifferSegと呼ばれる,単純だが汎用的な多モードバイナリセグメンテーションフレームワークを提案する。差分認識融合(DPF)モジュールの助けを借りて、DifferSegは学習可能な微分演算子を用いて多重モーダル特徴を適応的に整列し、残差融合を通じて相補性を向上し、モダリティミスマッチと融合冗長性を効果的に緩和する。 %) の高周波数構造とセマンティック低周波表現との整合性を維持し, きめ細かい境界回復と雑音抑圧を確保するために, クロス周波数相互作用とマルチパスアップサンプリングを構築する周波数誘導デコーダ(FGD)を設計する。 % がこれらの設計に適合しており、ディファセグは自然と医療の両方のモダリティを含む様々な二分法タスクに容易に一般化できる。ベルとホイッスルなしでは、18のダウンストリームタスクを含む29のパブリックデータセットで67の最先端メソッドを一貫して上回り、より優れた一般化とセグメンテーション精度を示す。コードと事前トレーニングされたモデルはLinkで利用可能になる。

論文の概要: DifferSeg: Towards Diverse Multimodal Binary Segmentation via Differential Perception and Frequency Guidance

関連論文リスト