Fugu-MT 論文翻訳(概要): Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data

論文の概要: Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data

arxiv url: http://arxiv.org/abs/2605.10498v1
Date: Mon, 11 May 2026 12:56:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.823833
Title: Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data
Title（参考訳）: 高不均衡なマルチモーダルデータに対する同時ロングテール認識とマルチモーダルフュージョン
Authors: Heegeon Yoon, Heeyoung Kim,
Abstract要約: クラス不均衡データの長期分布は、ディープラーニングモデルにとって根本的な課題である。マルチモーダル入力を明示的に処理するロングテール認識のための新しいフレームワークを提案する。提案手法は,異種データを統一表現に融合することにより,マルチエキスパートアーキテクチャをマルチモーダル設定に拡張する。
参考スコア（独自算出の注目度）: 9.797319790710711
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Long-tailed distributions in class-imbalanced data present a fundamental challenge for deep learning models, which tend to be biased toward majority classes. While recent methods for long-tailed recognition have mitigated this issue, they are largely restricted to single-modal inputs and cannot fully exploit complementary information from diverse data sources. In this work, we introduce a new framework for long-tailed recognition that explicitly handles multi-modal inputs. Our approach extends multi-expert architectures to the multi-modal setting by fusing heterogeneous data into a unified representation while leveraging modality-specific networks to estimate the informativeness of each modality. These confidence-guided weights dynamically modulate the fusion process, ensuring that more informative modalities contribute more strongly to the final decision. To further enhance performance, we design specialized training and test procedures that accommodate diverse modality combinations, including images and tabular data. Extensive experiments on benchmark and real-world datasets demonstrate that the proposed approach not only effectively integrates multi-modal information but also outperforms existing methods in handling long-tailed, class-imbalanced scenarios, highlighting its robustness and generalization capability.
Abstract（参考訳）: クラス不均衡データの長期分布は、多数派に偏りがちなディープラーニングモデルにおいて、根本的な課題となる。最近のロングテール認識法ではこの問題が緩和されているが、それらは主にシングルモーダル入力に限定されており、多様なデータソースからの補完情報を完全に活用することはできない。本研究では,マルチモーダル入力を明示的に処理する長鎖認識のための新しいフレームワークを提案する。提案手法は,異種データを統一表現に融合し,モダリティ固有のネットワークを活用して各モダリティの情報性を評価することで,マルチエキスパートアーキテクチャをマルチモーダル設定に拡張する。これらの信頼誘導重みは核融合過程を動的に変調し、より情報的なモダリティが最終決定に強く寄与することを保証する。パフォーマンスをさらに向上するため,画像や表データを含む多種多様なモダリティの組み合わせに対応する特別なトレーニングとテスト手順を設計する。ベンチマークと実世界のデータセットに関する大規模な実験により、提案手法はマルチモーダル情報を効果的に統合するだけでなく、長い尾のクラス不均衡なシナリオを扱う既存の手法よりも優れており、その堅牢性と一般化能力を強調している。

論文の概要: Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data

関連論文リスト