Fugu-MT 論文翻訳(概要): Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding

論文の概要: Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding

arxiv url: http://arxiv.org/abs/2410.14944v1
Date: Sat, 19 Oct 2024 02:27:30 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-28 17:07:37.786377
Title: Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding
Title（参考訳）: 複数モーダルシーン理解に向けた部分ホールリレーショナルフュージョン
Authors: Yi Liu, Chengxin Li, Shoukun Xu, Jungong Han,
Abstract要約: マルチモーダル融合はマルチモーダルシーン理解において重要な役割を担っている。既存のほとんどの手法は、2つのモダリティを含むクロスモーダル融合に焦点を当てており、しばしばより複雑なマルチモーダル融合を見落としている。マルチモーダルシーン理解のためのPWRF(Relational Part-Whole Fusion)フレームワークを提案する。
参考スコア（独自算出の注目度）: 51.96911650437978
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-modal fusion has played a vital role in multi-modal scene understanding. Most existing methods focus on cross-modal fusion involving two modalities, often overlooking more complex multi-modal fusion, which is essential for real-world applications like autonomous driving, where visible, depth, event, LiDAR, etc., are used. Besides, few attempts for multi-modal fusion, \emph{e.g.}, simple concatenation, cross-modal attention, and token selection, cannot well dig into the intrinsic shared and specific details of multiple modalities. To tackle the challenge, in this paper, we propose a Part-Whole Relational Fusion (PWRF) framework. For the first time, this framework treats multi-modal fusion as part-whole relational fusion. It routes multiple individual part-level modalities to a fused whole-level modality using the part-whole relational routing ability of Capsule Networks (CapsNets). Through this part-whole routing, our PWRF generates modal-shared and modal-specific semantics from the whole-level modal capsules and the routing coefficients, respectively. On top of that, modal-shared and modal-specific details can be employed to solve the issue of multi-modal scene understanding, including synthetic multi-modal segmentation and visible-depth-thermal salient object detection in this paper. Experiments on several datasets demonstrate the superiority of the proposed PWRF framework for multi-modal scene understanding. The source code has been released on https://github.com/liuyi1989/PWRF.
Abstract（参考訳）: マルチモーダル融合はマルチモーダルシーン理解において重要な役割を担っている。既存のほとんどの方法は、2つのモダリティを含むクロスモーダル融合に焦点を当てており、しばしばより複雑なマルチモーダル融合を見落としている。さらに、マルチモーダル融合(emph{e g }、単純連結、クロスモーダルアテンション、トークン選択)の試みは、複数のモーダルの固有の共有および特定の詳細を十分に掘り下げることができない。本稿では,PWRF(Part-Whole Relational Fusion)フレームワークを提案する。このフレームワークは初めて、マルチモーダル核融合を部分的リレーショナル核融合として扱う。カプセルネットワーク(Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks, Capsule Networks)は、複数の部分レベルのモダリティを融合した全体モードにルーティングする。この部分的なルーティングを通じて、PWRFは、各レベルのモダルカプセルとルーティング係数から、それぞれモダル共有およびモダル固有意味を生成します。さらに, 合成マルチモーダルセグメンテーションや可視深度熱塩物検出など, 多モーダルシーン理解の課題を解決するために, モーダルシェード, モーダル特有の詳細情報を用いることができる。複数のデータセットの実験は、マルチモーダルシーン理解のための提案されたPWRFフレームワークの優位性を実証している。ソースコードはhttps://github.com/liuyi1989/PWRFで公開されている。

論文の概要: Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding

関連論文リスト