Fugu-MT 論文翻訳(概要): MONET: Modality-Embracing Graph Convolutional Network and Target-Aware Attention for Multimedia Recommendation

論文の概要: MONET: Modality-Embracing Graph Convolutional Network and Target-Aware Attention for Multimedia Recommendation

arxiv url: http://arxiv.org/abs/2312.09511v1
Date: Fri, 15 Dec 2023 03:28:19 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-18 17:10:54.369067
Title: MONET: Modality-Embracing Graph Convolutional Network and Target-Aware Attention for Multimedia Recommendation
Title（参考訳）: monet:モダリティを包含するグラフ畳み込みネットワークとマルチメディア推薦のためのターゲット意識
Authors: Yungi Kim, Taeri Kim, Won-Yong Shin, and Sang-Wook Kim
Abstract要約: グラフ畳み込みネットワーク(GCN)を用いたマルチメディアレコメンデータシステムに着目する。本研究は,商品の嗜好を正確に把握するために,より効果的にマルチモーダル機能を活用することを目的とする。我々は,モダリティを考慮したGCN(MeGCN)とターゲット認識型アテンションの2つの基本概念からなる,MONETという新しいマルチメディアレコメンデータシステムを提案する。
参考スコア（独自算出の注目度）: 21.61057660080108
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this paper, we focus on multimedia recommender systems using graph convolutional networks (GCNs) where the multimodal features as well as user-item interactions are employed together. Our study aims to exploit multimodal features more effectively in order to accurately capture users' preferences for items. To this end, we point out following two limitations of existing GCN-based multimedia recommender systems: (L1) although multimodal features of interacted items by a user can reveal her preferences on items, existing methods utilize GCN designed to focus only on capturing collaborative signals, resulting in insufficient reflection of the multimodal features in the final user/item embeddings; (L2) although a user decides whether to prefer the target item by considering its multimodal features, existing methods represent her as only a single embedding regardless of the target item's multimodal features and then utilize her embedding to predict her preference for the target item. To address the above issues, we propose a novel multimedia recommender system, named MONET, composed of following two core ideas: modality-embracing GCN (MeGCN) and target-aware attention. Through extensive experiments using four real-world datasets, we demonstrate i) the significant superiority of MONET over seven state-of-the-art competitors (up to 30.32% higher accuracy in terms of recall@20, compared to the best competitor) and ii) the effectiveness of the two core ideas in MONET. All MONET codes are available at https://github.com/Kimyungi/MONET.
Abstract（参考訳）: 本稿では,グラフ畳み込みネットワーク(gcns)を用いたマルチメディアレコメンダシステムに着目し,マルチモーダル機能とユーザ・テーマインタラクションを併用する。本研究は,商品の嗜好を正確に把握するために,より効果的にマルチモーダル機能を活用することを目的とする。 To this end, we point out following two limitations of existing GCN-based multimedia recommender systems: (L1) although multimodal features of interacted items by a user can reveal her preferences on items, existing methods utilize GCN designed to focus only on capturing collaborative signals, resulting in insufficient reflection of the multimodal features in the final user/item embeddings; (L2) although a user decides whether to prefer the target item by considering its multimodal features, existing methods represent her as only a single embedding regardless of the target item's multimodal features and then utilize her embedding to predict her preference for the target item. 上記の課題に対処するために,モダリティを考慮したGCN(MeGCN)とターゲット認識型アテンションの2つの基本概念からなる,MONETという新しいマルチメディアレコメンデータシステムを提案する。 4つの実世界のデータセットを用いた広範囲な実験を通じて i) 7つの最先端コンペティター(最高のコンペティターと比較して、リコール@20の精度が最大30.32%高い)とMONETの顕著な優位性 ii)MONETにおける2つの中核的概念の有効性。すべてのMONETコードはhttps://github.com/Kimyungi/MONETで入手できる。

関連論文リスト

Enhancing Live Broadcast Engagement: A Multi-modal Approach to Short Video Recommendations Using MMGCN and User Preferences [0.0]
本稿では,MMGCN(Multi-modal Graph Convolutional Networks)をユーザの好みに組み込んだ短いビデオレコメンデーションシステムを提案する。個人の興味に合ったパーソナライズされたレコメンデーションを提供するために,提案システムはユーザインタラクションデータ,ビデオコンテンツ機能,コンテキスト情報などを考慮に入れている。システムの有効性を評価するために、Kwai、TikTok、MovieLensの3つのデータセットが使用されている。
論文参考訳（メタデータ） (2025-06-29T04:50:52Z)
Learning Item Representations Directly from Multimodal Features for Effective Recommendation [51.49251689107541]
マルチモーダルレコメンデータシステムは、主にベイズパーソナライズされたランク付け(BPR)最適化を利用してアイテム表現を学習する。本稿では,マルチモーダルな特徴からアイテム表現を直接学習し,推薦性能を向上する新しいモデル(LIRDRec)を提案する。
論文参考訳（メタデータ） (2025-05-08T05:42:22Z)
Quadratic Interest Network for Multimodal Click-Through Rate Prediction [12.989347150912685]
産業レコメンデーションシステムにおいて,マルチモーダルクリックスルー率(CTR)予測は重要な手法である。マルチモーダルCTR予測のためのQINと呼ばれるタスク2の新しいモデルを提案する。
論文参考訳（メタデータ） (2025-04-24T16:08:52Z)
Less is More: Information Bottleneck Denoised Multimedia Recommendation [43.66791467993419]
我々は、Information Bottleneck principle (IB) を用いて、認知マルチメディアレコメンデーションパラダイムを提案する。 IBMRecは機能面と項目面の両方からタスク非関連の機能を取り除いている。マルチメディア表現とレコメンデーションタスクの相互情報を最大化する。
論文参考訳（メタデータ） (2025-01-21T14:33:07Z)
Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation [97.82707398481273]
メタマルチモーダルフュージョン(MetaMMF)と呼ばれるメタラーニングに基づく新しいマルチモーダルフュージョンフレームワークを開発する。メタMMFは、入力タスクのマルチモーダル特徴から抽出されたメタ情報に基づいて、メタラーナを介して、アイテム固有の融合関数としてニューラルネットワークをパラメータ化する。我々は3つのベンチマークデータセットに対して広範な実験を行い、最先端のマルチモーダルレコメンデーションモデルに対する大幅な改善を実証した。
論文参考訳（メタデータ） (2025-01-13T07:51:43Z)
MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt [60.10555128510744]
ReID(Multi-modal object Re-IDentification)は、異なるモダリティから補完的な画像情報を活用することで、特定のオブジェクトを検索することを目的としている。近年、CLIPのような大規模事前学習モデルでは、従来のシングルモーダルオブジェクトReIDタスクで顕著なパフォーマンスを示している。マルチモーダルオブジェクトReIDのための新しいフレームワークであるMambaProを紹介する。
論文参考訳（メタデータ） (2024-12-14T06:33:53Z)
Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
FS-PCS (Few-shot 3D point cloud segmentation) は、最小のサポートサンプルで新しいカテゴリを分割するモデルを一般化することを目的としている。本稿では,テキストラベルと潜在的に利用可能な2次元画像モダリティを利用して,コストフリーのマルチモーダルFS-PCSセットアップを提案する。トレーニングバイアスを軽減するため,テスト時間適応型クロスモーダルセグ(TACC)技術を提案する。
論文参考訳（メタデータ） (2024-10-29T19:28:41Z)
MIMNet: Multi-Interest Meta Network with Multi-Granularity Target-Guided Attention for Cross-domain Recommendation [6.7902741961967]
クロスドメインレコメンデーション(CDR)は、スパーシリティとコールドスタートの問題を緩和する上で重要な役割を果たす。クロスドメインレコメンデーションのためのMIMNet(Multi-interest Meta Network)を提案する。
論文参考訳（メタデータ） (2024-07-31T13:30:34Z)
NoteLLM-2: Multimodal Large Representation Models for Recommendation [71.87790090964734]
大規模言語モデル(LLM)は、テキスト理解や埋め込みタスクにおいて、例外的な習熟度を示している。マルチモーダル表現のポテンシャル、特にアイテムツーイテム(I2I)レコメンデーションについては、未解明のままである。本稿では,既存のLLMと視覚エンコーダの統合をカスタマイズし,効率的なマルチモーダル表現を実現するエンド・ツー・エンドのファインチューニング手法を提案する。
論文参考訳（メタデータ） (2024-05-27T03:24:01Z)
U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semanticsを紹介する。我々は,グローバルな特徴とローカルな特徴の効果的な抽出と統合を保証するために,複数のスケールで機能融合を採用している。実験により,本手法は複数のデータセットにまたがって優れた性能を発揮することが示された。
論文参考訳（メタデータ） (2024-05-24T08:58:48Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
マルチモーダルな操作検出とグラウンド処理のためのトランスフォーマーベースのフレームワークを構築する。本フレームワークは,マルチモーダルアライメントの能力を維持しながら,モダリティ特有の特徴を同時に探求する。本稿では,グローバルな文脈的キューを各モーダル内に適応的に集約する暗黙的操作クエリ(IMQ)を提案する。
論文参考訳（メタデータ） (2023-09-22T06:55:41Z)
Just Noticeable Visual Redundancy Forecasting: A Deep Multimodal-driven Approach [11.600496805298778]
JND(Just noticeable difference)とは、人間の目が知覚できない最大の視覚変化を指す用語である。本稿では,JNDモデリングをエンドツーエンドのマルチモーダル,すなわちhmJND-Netの観点から検討する。
論文参考訳（メタデータ） (2023-03-18T09:36:59Z)
M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection [1.002712867721496]
RGB-Dに基づく手法は、多モード特徴融合の不整合性とマルチスケール特徴集合の不整合に悩まされることが多い。マルチモーダル・マルチスケール改良ネットワーク(M2RNet)を提案する。このネットワークには3つの重要なコンポーネントが紹介されている。
論文参考訳（メタデータ） (2021-09-16T12:15:40Z)
Specificity-preserving RGB-D Saliency Detection [103.3722116992476]
本稿では,RGB-Dサリエンシ検出のための特異性保存ネットワーク(SP-Net)を提案する。 2つのモダリティ特化ネットワークと共有学習ネットワークを採用し、個別および共有唾液マップを生成する。 6つのベンチマークデータセットの実験では、SP-Netは他の最先端の手法よりも優れています。
論文参考訳（メタデータ） (2021-08-18T14:14:22Z)
MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation [32.15124603618625]
本研究では,マルチモーダル融合グラフ畳み込みネットワークMMGCNに基づく新しいモデルを提案する。 MMGCNは、マルチモーダル依存関係を効果的に活用できるだけでなく、話者間の依存性や話者内依存性をモデル化するために話者情報を利用することもできる。提案したモデルを,IEMOCAPとMELDという2つの公開ベンチマークデータセット上で評価し,MMGCNの有効性を実証した。
論文参考訳（メタデータ） (2021-07-14T15:37:02Z)
Mining Latent Structures for Multimedia Recommendation [46.70109406399858]
本稿では,マルチモーダル再圧縮のためのLATent sTructureマイニング手法を提案する。各モダリティの項目構造を学び、複数のモダリティを集約して潜在アイテムグラフを得る。学習した潜在グラフに基づいてグラフ畳み込みを行い、アイテム表現に高次項目親和性を明示的に注入する。
論文参考訳（メタデータ） (2021-04-19T03:50:24Z)
VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles [63.32111010686954]
マルチモーダル出力(VMSMO)を用いたビデオベースマルチモーダル要約の課題を提案する。このタスクの主な課題は、ビデオの時間的依存性と記事の意味を共同でモデル化することである。本稿では,デュアルインタラクションモジュールとマルチモーダルジェネレータからなるDual-Interaction-based Multimodal Summarizer (DIMS)を提案する。
論文参考訳（メタデータ） (2020-10-12T02:19:16Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。