Fugu-MT 論文翻訳(概要): Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation

論文の概要: Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation

arxiv url: http://arxiv.org/abs/2509.09114v1
Date: Thu, 11 Sep 2025 02:52:26 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-12 16:52:24.209306
Title: Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation
Title（参考訳）: マルチモーダルレコメンデーションのためのマルチスケールバイラテラルアテンションによるモダリティアライメント
Authors: Kelin Ren, Chan-Yang Ju, Dong-Ho Lee,
Abstract要約: MambaRecは、ローカルな特徴アライメントとグローバルな分散正規化を統合する新しいフレームワークである。 DREAMモジュールは階層的関係とコンテキスト認識関連をキャプチャし、モーダル間セマンティックモデリングを改善する。実世界のeコマースデータセットの実験によると、MambaRecは融合品質、一般化、効率で既存の手法より優れている。
参考スコア（独自算出の注目度）: 9.91438130100011
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal recommendation systems are increasingly becoming foundational technologies for e-commerce and content platforms, enabling personalized services by jointly modeling users' historical behaviors and the multimodal features of items (e.g., visual and textual). However, most existing methods rely on either static fusion strategies or graph-based local interaction modeling, facing two critical limitations: (1) insufficient ability to model fine-grained cross-modal associations, leading to suboptimal fusion quality; and (2) a lack of global distribution-level consistency, causing representational bias. To address these, we propose MambaRec, a novel framework that integrates local feature alignment and global distribution regularization via attention-guided learning. At its core, we introduce the Dilated Refinement Attention Module (DREAM), which uses multi-scale dilated convolutions with channel-wise and spatial attention to align fine-grained semantic patterns between visual and textual modalities. This module captures hierarchical relationships and context-aware associations, improving cross-modal semantic modeling. Additionally, we apply Maximum Mean Discrepancy (MMD) and contrastive loss functions to constrain global modality alignment, enhancing semantic consistency. This dual regularization reduces mode-specific deviations and boosts robustness. To improve scalability, MambaRec employs a dimensionality reduction strategy to lower the computational cost of high-dimensional multimodal features. Extensive experiments on real-world e-commerce datasets show that MambaRec outperforms existing methods in fusion quality, generalization, and efficiency. Our code has been made publicly available at https://github.com/rkl71/MambaRec.
Abstract（参考訳）: マルチモーダルレコメンデーションシステムは電子商取引やコンテンツプラットフォームの基礎技術になりつつあるため、ユーザの履歴行動とアイテムのマルチモーダル特徴(例えば視覚的・テキスト的)を共同でモデル化することでパーソナライズされたサービスを可能にしている。しかし、既存のほとんどの手法は、静的融合戦略またはグラフベースの局所相互作用モデリングのいずれかに依存しており、(1)細粒度のクロスモーダルな関連をモデル化する能力の不足、(2)大域的な分布レベルの一貫性の欠如、そして表現バイアスを引き起こす2つの限界に直面している。そこで我々は,局所的な特徴アライメントと,注意誘導学習によるグローバルな分布正規化を統合した新しいフレームワークであるMambaRecを提案する。その中核となるDREAM(Dilated Refinement Attention Module)は,マルチスケールな拡張畳み込みをチャネル的に,空間的に注意して,視覚とテクスチャの微粒な意味パターンを整合させる。このモジュールは階層的な関係とコンテキスト認識の関連を捉え、モーダル間セマンティックモデリングを改善する。さらに,大域的モダリティアライメントを制約し,意味的整合性を高めるために,最大平均離散性(MMD)と対照的な損失関数を適用した。この二重正則化はモード特異的な偏差を低減し、堅牢性を高める。スケーラビリティを向上させるため、MambaRecは高次元マルチモーダル特徴の計算コストを下げるために次元削減戦略を採用している。実世界のeコマースデータセットに関する大規模な実験によると、MambaRecは融合品質、一般化、効率で既存の手法より優れている。私たちのコードはhttps://github.com/rkl71/MambaRec.comで公開されています。

論文の概要: Modality Alignment with Multi-scale Bilateral Attention for Multimodal Recommendation

関連論文リスト